File size: 125,092 Bytes
8c97474
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
[2025-05-14 21:43:37] Created output directory: train_results_pred_mask/google_gemma-3-1b-pt_qa_ds100_upsample1000
[2025-05-14 21:43:37] Chat mode disabled
[2025-05-14 21:43:37] Model size is 3B or smaller (1 B). Using full fine-tuning.
[2025-05-14 21:43:37] No QA format data will be used
[2025-05-14 21:43:37] Limiting dataset size to: 100 samples
[2025-05-14 21:43:37] =======================================
[2025-05-14 21:43:37] Starting training for model: google/gemma-3-1b-pt
[2025-05-14 21:43:37] =======================================
[2025-05-14 21:43:37] CUDA_VISIBLE_DEVICES: 0,1,2,3
[2025-05-14 21:43:37] WANDB_PROJECT: wikidyk-ar
[2025-05-14 21:43:37] DATA_PATH: data/wikidyk2022-2025_01082025_gpt-4o_evalv2_pages_formatted_combined_v2_trainqas.json
[2025-05-14 21:43:37] Global Batch Size: 128
[2025-05-14 21:43:37] Data Size: 100
[2025-05-14 21:43:37] Executing command: torchrun --nproc_per_node "4" --master-port 29581 src/train.py       --model_name_or_path "google/gemma-3-1b-pt"       --data_path "data/wikidyk2022-2025_01082025_gpt-4o_evalv2_pages_formatted_combined_v2_trainqas.json"       --output_dir "train_results_pred_mask/google_gemma-3-1b-pt_qa_ds100_upsample1000"       --num_upsample "1000"       --per_device_train_batch_size "32"       --gradient_accumulation_steps "1"       --learning_rate "2e-5"       --num_train_epochs "1"       --model_max_length "32768"       --report_to wandb --logging_steps 50       --save_strategy steps --save_steps 10000       --save_total_limit 3       --resume_from_checkpoint True       --bf16 True --use_flash_attention_2 True       --qa_data_ratio "-1"       --predict_mask "false"                            --ds_size 100
[2025-05-14 21:43:37] Training started at Wed May 14 21:43:37 UTC 2025
W0514 21:43:38.845000 618618 site-packages/torch/distributed/run.py:792] 
W0514 21:43:38.845000 618618 site-packages/torch/distributed/run.py:792] *****************************************
W0514 21:43:38.845000 618618 site-packages/torch/distributed/run.py:792] Setting OMP_NUM_THREADS environment variable for each process to be 1 in default, to avoid your system being overloaded, please further tune the variable for optimal performance in your application as needed. 
W0514 21:43:38.845000 618618 site-packages/torch/distributed/run.py:792] *****************************************
WARNING:root:Output directory: train_results_pred_mask/google_gemma-3-1b-pt_qa_ds100_upsample1000
The model was loaded with use_flash_attention_2=True, which is deprecated and may be removed in a future release. Please use `attn_implementation="flash_attention_2"` instead.
You are attempting to use Flash Attention 2.0 with a model not initialized on GPU. Make sure to move the model to GPU after initializing it on CPU with `model.to('cuda')`.
WARNING:root:Output directory: train_results_pred_mask/google_gemma-3-1b-pt_qa_ds100_upsample1000
The model was loaded with use_flash_attention_2=True, which is deprecated and may be removed in a future release. Please use `attn_implementation="flash_attention_2"` instead.
You are attempting to use Flash Attention 2.0 with a model not initialized on GPU. Make sure to move the model to GPU after initializing it on CPU with `model.to('cuda')`.
WARNING:root:Output directory: train_results_pred_mask/google_gemma-3-1b-pt_qa_ds100_upsample1000
WARNING:root:Output directory: train_results_pred_mask/google_gemma-3-1b-pt_qa_ds100_upsample1000
The model was loaded with use_flash_attention_2=True, which is deprecated and may be removed in a future release. Please use `attn_implementation="flash_attention_2"` instead.
You are attempting to use Flash Attention 2.0 with a model not initialized on GPU. Make sure to move the model to GPU after initializing it on CPU with `model.to('cuda')`.
WARNING:root:Loading data...
The model was loaded with use_flash_attention_2=True, which is deprecated and may be removed in a future release. Please use `attn_implementation="flash_attention_2"` instead.
You are attempting to use Flash Attention 2.0 with a model not initialized on GPU. Make sure to move the model to GPU after initializing it on CPU with `model.to('cuda')`.
WARNING:root:Loading data...
WARNING:root:Loading data...
WARNING:root:Loading data...
WARNING:root:Dataset initialized with all QA data:
WARNING:root:  - 100000 QA examples
WARNING:root:  - 100 fact examples with upsampling factor 1000
WARNING:root:  - Total examples: 200000
/root/yuwei/WikiDYKEvalV2/src/train.py:119: FutureWarning: `tokenizer` is deprecated and will be removed in version 5.0.0 for `Trainer.__init__`. Use `processing_class` instead.
  trainer = Trainer(model=model, tokenizer=tokenizer, args=training_args, **data_module)
WARNING:root:Dataset initialized with all QA data:
WARNING:root:  - 100000 QA examples
WARNING:root:  - 100 fact examples with upsampling factor 1000
WARNING:root:  - Total examples: 200000
/root/yuwei/WikiDYKEvalV2/src/train.py:119: FutureWarning: `tokenizer` is deprecated and will be removed in version 5.0.0 for `Trainer.__init__`. Use `processing_class` instead.
  trainer = Trainer(model=model, tokenizer=tokenizer, args=training_args, **data_module)
WARNING:root:Dataset initialized with all QA data:
WARNING:root:  - 100000 QA examples
WARNING:root:  - 100 fact examples with upsampling factor 1000
WARNING:root:  - Total examples: 200000
/root/yuwei/WikiDYKEvalV2/src/train.py:119: FutureWarning: `tokenizer` is deprecated and will be removed in version 5.0.0 for `Trainer.__init__`. Use `processing_class` instead.
  trainer = Trainer(model=model, tokenizer=tokenizer, args=training_args, **data_module)
WARNING:root:Dataset initialized with all QA data:
WARNING:root:  - 100000 QA examples
WARNING:root:  - 100 fact examples with upsampling factor 1000
WARNING:root:  - Total examples: 200000
/root/yuwei/WikiDYKEvalV2/src/train.py:119: FutureWarning: `tokenizer` is deprecated and will be removed in version 5.0.0 for `Trainer.__init__`. Use `processing_class` instead.
  trainer = Trainer(model=model, tokenizer=tokenizer, args=training_args, **data_module)
Checkpoint missing; starting training from scratch
Checkpoint missing; starting training from scratch
Checkpoint missing; starting training from scratch
Checkpoint missing; starting training from scratch
wandb: WARNING The `run_name` is currently set to the same value as `TrainingArguments.output_dir`. If this was not intended, please specify a different run name by setting the `TrainingArguments.run_name` parameter.
It is strongly recommended to train Gemma3 models with the `eager` attention implementation instead of `flash_attention_2`. Use `eager` with `AutoModelForCausalLM.from_pretrained('<path-to-checkpoint>', attn_implementation='eager')`.
It is strongly recommended to train Gemma3 models with the `eager` attention implementation instead of `flash_attention_2`. Use `eager` with `AutoModelForCausalLM.from_pretrained('<path-to-checkpoint>', attn_implementation='eager')`.
It is strongly recommended to train Gemma3 models with the `eager` attention implementation instead of `flash_attention_2`. Use `eager` with `AutoModelForCausalLM.from_pretrained('<path-to-checkpoint>', attn_implementation='eager')`.
wandb: Currently logged in as: yuweiz to https://api.wandb.ai. Use `wandb login --relogin` to force relogin
wandb: Tracking run with wandb version 0.19.11
wandb: Run data is saved locally in /root/yuwei/WikiDYKEvalV2/wandb/run-20250514_214351-thkr8ndb
wandb: Run `wandb offline` to turn off syncing.
wandb: Syncing run train_results_pred_mask/google_gemma-3-1b-pt_qa_ds100_upsample1000
wandb: ⭐️ View project at https://wandb.ai/yuweiz/wikidyk-ar
wandb: πŸš€ View run at https://wandb.ai/yuweiz/wikidyk-ar/runs/thkr8ndb

  0%|          | 0/1563 [00:00<?, ?it/s]It is strongly recommended to train Gemma3 models with the `eager` attention implementation instead of `flash_attention_2`. Use `eager` with `AutoModelForCausalLM.from_pretrained('<path-to-checkpoint>', attn_implementation='eager')`.
[rank2]:[W514 21:43:53.328500884 reducer.cpp:1400] Warning: find_unused_parameters=True was specified in DDP constructor, but did not find any unused parameters in the forward pass. This flag results in an extra traversal of the autograd graph every iteration,  which can adversely affect performance. If your model indeed never has any unused parameters in the forward pass, consider turning this flag off. Note that this warning may be a false positive if your model has flow control causing later iterations to have unused parameters. (function operator())
[rank1]:[W514 21:43:53.333029675 reducer.cpp:1400] Warning: find_unused_parameters=True was specified in DDP constructor, but did not find any unused parameters in the forward pass. This flag results in an extra traversal of the autograd graph every iteration,  which can adversely affect performance. If your model indeed never has any unused parameters in the forward pass, consider turning this flag off. Note that this warning may be a false positive if your model has flow control causing later iterations to have unused parameters. (function operator())
[rank3]:[W514 21:43:53.336456929 reducer.cpp:1400] Warning: find_unused_parameters=True was specified in DDP constructor, but did not find any unused parameters in the forward pass. This flag results in an extra traversal of the autograd graph every iteration,  which can adversely affect performance. If your model indeed never has any unused parameters in the forward pass, consider turning this flag off. Note that this warning may be a false positive if your model has flow control causing later iterations to have unused parameters. (function operator())
[rank0]:[W514 21:43:53.339178719 reducer.cpp:1400] Warning: find_unused_parameters=True was specified in DDP constructor, but did not find any unused parameters in the forward pass. This flag results in an extra traversal of the autograd graph every iteration,  which can adversely affect performance. If your model indeed never has any unused parameters in the forward pass, consider turning this flag off. Note that this warning may be a false positive if your model has flow control causing later iterations to have unused parameters. (function operator())

  0%|          | 1/1563 [00:02<58:53,  2.26s/it]
  0%|          | 2/1563 [00:02<31:34,  1.21s/it]
  0%|          | 3/1563 [00:03<25:28,  1.02it/s]
  0%|          | 4/1563 [00:04<29:51,  1.15s/it]
  0%|          | 5/1563 [00:05<23:51,  1.09it/s]
  0%|          | 6/1563 [00:06<23:17,  1.11it/s]
  0%|          | 7/1563 [00:06<21:42,  1.19it/s]
  1%|          | 8/1563 [00:07<22:32,  1.15it/s]
  1%|          | 9/1563 [00:08<21:40,  1.19it/s]
  1%|          | 10/1563 [00:09<18:39,  1.39it/s]
  1%|          | 11/1563 [00:09<16:43,  1.55it/s]
  1%|          | 12/1563 [00:10<18:18,  1.41it/s]
  1%|          | 13/1563 [00:10<17:12,  1.50it/s]
  1%|          | 14/1563 [00:11<17:28,  1.48it/s]
  1%|          | 15/1563 [00:12<17:57,  1.44it/s]
  1%|          | 16/1563 [00:13<17:49,  1.45it/s]
  1%|          | 17/1563 [00:13<17:00,  1.52it/s]
  1%|          | 18/1563 [00:14<15:29,  1.66it/s]
  1%|          | 19/1563 [00:14<17:11,  1.50it/s]
  1%|▏         | 20/1563 [00:15<16:15,  1.58it/s]
  1%|▏         | 21/1563 [00:16<16:34,  1.55it/s]
  1%|▏         | 22/1563 [00:17<22:29,  1.14it/s]
  1%|▏         | 23/1563 [00:18<20:43,  1.24it/s]
  2%|▏         | 24/1563 [00:18<18:00,  1.42it/s]
  2%|▏         | 25/1563 [00:19<17:10,  1.49it/s]
  2%|▏         | 26/1563 [00:19<15:25,  1.66it/s]
  2%|▏         | 27/1563 [00:20<17:09,  1.49it/s]
  2%|▏         | 28/1563 [00:21<17:14,  1.48it/s]
  2%|▏         | 29/1563 [00:21<16:29,  1.55it/s]
  2%|▏         | 30/1563 [00:22<15:06,  1.69it/s]
  2%|▏         | 31/1563 [00:22<14:10,  1.80it/s]
  2%|▏         | 32/1563 [00:23<16:17,  1.57it/s]
  2%|▏         | 33/1563 [00:24<20:09,  1.26it/s]
  2%|▏         | 34/1563 [00:25<17:39,  1.44it/s]
  2%|▏         | 35/1563 [00:25<15:57,  1.60it/s]
  2%|▏         | 36/1563 [00:26<15:51,  1.60it/s]
  2%|▏         | 37/1563 [00:26<15:29,  1.64it/s]
  2%|▏         | 38/1563 [00:27<16:19,  1.56it/s]
  2%|▏         | 39/1563 [00:28<16:26,  1.54it/s]
  3%|β–Ž         | 40/1563 [00:28<16:19,  1.56it/s]
  3%|β–Ž         | 41/1563 [00:29<15:50,  1.60it/s]
  3%|β–Ž         | 42/1563 [00:30<16:59,  1.49it/s]
  3%|β–Ž         | 43/1563 [00:30<17:21,  1.46it/s]
  3%|β–Ž         | 44/1563 [00:31<18:16,  1.38it/s]
  3%|β–Ž         | 45/1563 [00:32<17:32,  1.44it/s]
  3%|β–Ž         | 46/1563 [00:32<15:54,  1.59it/s]
  3%|β–Ž         | 47/1563 [00:33<15:21,  1.65it/s]
  3%|β–Ž         | 48/1563 [00:33<14:43,  1.71it/s]
  3%|β–Ž         | 49/1563 [00:34<15:37,  1.62it/s]
  3%|β–Ž         | 50/1563 [00:35<16:50,  1.50it/s]
                                                 
{'loss': 3.5671, 'grad_norm': 996.0, 'learning_rate': 1.9373000639795267e-05, 'epoch': 0.03}

  3%|β–Ž         | 50/1563 [00:35<16:50,  1.50it/s]
  3%|β–Ž         | 51/1563 [00:36<17:52,  1.41it/s]
  3%|β–Ž         | 52/1563 [00:36<16:09,  1.56it/s]
  3%|β–Ž         | 53/1563 [00:37<17:24,  1.45it/s]
  3%|β–Ž         | 54/1563 [00:38<18:36,  1.35it/s]
  4%|β–Ž         | 55/1563 [00:39<19:03,  1.32it/s]
  4%|β–Ž         | 56/1563 [00:39<17:03,  1.47it/s]
  4%|β–Ž         | 57/1563 [00:41<21:41,  1.16it/s]
  4%|β–Ž         | 58/1563 [00:41<20:19,  1.23it/s]
  4%|▍         | 59/1563 [00:42<17:54,  1.40it/s]
  4%|▍         | 60/1563 [00:42<16:05,  1.56it/s]
  4%|▍         | 61/1563 [00:43<17:34,  1.42it/s]
  4%|▍         | 62/1563 [00:44<18:03,  1.38it/s]
  4%|▍         | 63/1563 [00:44<17:18,  1.44it/s]
  4%|▍         | 64/1563 [00:45<17:29,  1.43it/s]
  4%|▍         | 65/1563 [00:46<16:14,  1.54it/s]
  4%|▍         | 66/1563 [00:46<16:43,  1.49it/s]
  4%|▍         | 67/1563 [00:47<15:01,  1.66it/s]
  4%|▍         | 68/1563 [00:47<14:43,  1.69it/s]
  4%|▍         | 69/1563 [00:48<13:42,  1.82it/s]
  4%|▍         | 70/1563 [00:48<14:28,  1.72it/s]
  5%|▍         | 71/1563 [00:49<16:06,  1.54it/s]
  5%|▍         | 72/1563 [00:50<15:00,  1.66it/s]
  5%|▍         | 73/1563 [00:50<15:14,  1.63it/s]
  5%|▍         | 74/1563 [00:51<16:11,  1.53it/s]
  5%|▍         | 75/1563 [00:52<16:13,  1.53it/s]
  5%|▍         | 76/1563 [00:53<16:27,  1.51it/s]
  5%|▍         | 77/1563 [00:53<14:50,  1.67it/s]
  5%|▍         | 78/1563 [00:54<15:09,  1.63it/s]
  5%|β–Œ         | 79/1563 [00:54<13:55,  1.78it/s]
  5%|β–Œ         | 80/1563 [00:55<15:26,  1.60it/s]
  5%|β–Œ         | 81/1563 [00:56<15:48,  1.56it/s]
  5%|β–Œ         | 82/1563 [00:56<16:44,  1.47it/s]
  5%|β–Œ         | 83/1563 [00:57<15:32,  1.59it/s]
  5%|β–Œ         | 84/1563 [00:57<14:18,  1.72it/s]
  5%|β–Œ         | 85/1563 [00:58<16:08,  1.53it/s]
  6%|β–Œ         | 86/1563 [00:59<16:27,  1.50it/s]
  6%|β–Œ         | 87/1563 [00:59<14:37,  1.68it/s]
  6%|β–Œ         | 88/1563 [01:00<14:22,  1.71it/s]
  6%|β–Œ         | 89/1563 [01:01<16:11,  1.52it/s]
  6%|β–Œ         | 90/1563 [01:01<14:48,  1.66it/s]
  6%|β–Œ         | 91/1563 [01:02<16:03,  1.53it/s]
  6%|β–Œ         | 92/1563 [01:03<17:37,  1.39it/s]
  6%|β–Œ         | 93/1563 [01:03<16:19,  1.50it/s]
  6%|β–Œ         | 94/1563 [01:04<14:42,  1.66it/s]
  6%|β–Œ         | 95/1563 [01:05<17:03,  1.43it/s]
  6%|β–Œ         | 96/1563 [01:05<16:24,  1.49it/s]
  6%|β–Œ         | 97/1563 [01:06<16:37,  1.47it/s]
  6%|β–‹         | 98/1563 [01:07<17:51,  1.37it/s]
  6%|β–‹         | 99/1563 [01:07<16:22,  1.49it/s]
  6%|β–‹         | 100/1563 [01:08<16:51,  1.45it/s]
                                                  
{'loss': 0.7259, 'grad_norm': 15.6875, 'learning_rate': 1.8733205374280233e-05, 'epoch': 0.06}

  6%|β–‹         | 100/1563 [01:08<16:51,  1.45it/s]
  6%|β–‹         | 101/1563 [01:09<16:29,  1.48it/s]
  7%|β–‹         | 102/1563 [01:09<16:37,  1.47it/s]
  7%|β–‹         | 103/1563 [01:10<17:50,  1.36it/s]
  7%|β–‹         | 104/1563 [01:11<16:26,  1.48it/s]
  7%|β–‹         | 105/1563 [01:11<16:29,  1.47it/s]
  7%|β–‹         | 106/1563 [01:12<14:43,  1.65it/s]
  7%|β–‹         | 107/1563 [01:13<16:21,  1.48it/s]
  7%|β–‹         | 108/1563 [01:13<15:55,  1.52it/s]
  7%|β–‹         | 109/1563 [01:14<16:31,  1.47it/s]
  7%|β–‹         | 110/1563 [01:15<16:16,  1.49it/s]
  7%|β–‹         | 111/1563 [01:15<14:37,  1.65it/s]
  7%|β–‹         | 112/1563 [01:16<15:08,  1.60it/s]
  7%|β–‹         | 113/1563 [01:16<14:23,  1.68it/s]
  7%|β–‹         | 114/1563 [01:17<14:49,  1.63it/s]
  7%|β–‹         | 115/1563 [01:18<13:39,  1.77it/s]
  7%|β–‹         | 116/1563 [01:18<15:36,  1.55it/s]
  7%|β–‹         | 117/1563 [01:19<14:38,  1.65it/s]
  8%|β–Š         | 118/1563 [01:19<14:31,  1.66it/s]
  8%|β–Š         | 119/1563 [01:20<13:54,  1.73it/s]
  8%|β–Š         | 120/1563 [01:20<12:50,  1.87it/s]
  8%|β–Š         | 121/1563 [01:21<14:24,  1.67it/s]
  8%|β–Š         | 122/1563 [01:22<13:48,  1.74it/s]
  8%|β–Š         | 123/1563 [01:23<15:53,  1.51it/s]
  8%|β–Š         | 124/1563 [01:23<14:52,  1.61it/s]
  8%|β–Š         | 125/1563 [01:24<15:52,  1.51it/s]
  8%|β–Š         | 126/1563 [01:25<17:03,  1.40it/s]
  8%|β–Š         | 127/1563 [01:26<18:01,  1.33it/s]
  8%|β–Š         | 128/1563 [01:26<18:37,  1.28it/s]
  8%|β–Š         | 129/1563 [01:27<16:26,  1.45it/s]
  8%|β–Š         | 130/1563 [01:28<16:22,  1.46it/s]
  8%|β–Š         | 131/1563 [01:28<17:15,  1.38it/s]
  8%|β–Š         | 132/1563 [01:29<15:33,  1.53it/s]
  9%|β–Š         | 133/1563 [01:29<14:53,  1.60it/s]
  9%|β–Š         | 134/1563 [01:30<16:31,  1.44it/s]
  9%|β–Š         | 135/1563 [01:31<16:27,  1.45it/s]
  9%|β–Š         | 136/1563 [01:32<16:54,  1.41it/s]
  9%|β–‰         | 137/1563 [01:32<15:07,  1.57it/s]
  9%|β–‰         | 138/1563 [01:33<16:33,  1.43it/s]
  9%|β–‰         | 139/1563 [01:33<14:51,  1.60it/s]
  9%|β–‰         | 140/1563 [01:34<15:38,  1.52it/s]
  9%|β–‰         | 141/1563 [01:35<15:49,  1.50it/s]
  9%|β–‰         | 142/1563 [01:35<14:10,  1.67it/s]
  9%|β–‰         | 143/1563 [01:36<14:24,  1.64it/s]
  9%|β–‰         | 144/1563 [01:37<14:55,  1.58it/s]
  9%|β–‰         | 145/1563 [01:37<13:48,  1.71it/s]
  9%|β–‰         | 146/1563 [01:38<14:15,  1.66it/s]
  9%|β–‰         | 147/1563 [01:39<15:53,  1.48it/s]
  9%|β–‰         | 148/1563 [01:39<16:32,  1.43it/s]
 10%|β–‰         | 149/1563 [01:40<16:38,  1.42it/s]
 10%|β–‰         | 150/1563 [01:41<17:39,  1.33it/s]
                                                  
{'loss': 0.2407, 'grad_norm': 35.75, 'learning_rate': 1.8093410108765196e-05, 'epoch': 0.1}

 10%|β–‰         | 150/1563 [01:41<17:39,  1.33it/s]
 10%|β–‰         | 151/1563 [01:41<15:54,  1.48it/s]
 10%|β–‰         | 152/1563 [01:42<14:24,  1.63it/s]
 10%|β–‰         | 153/1563 [01:43<16:06,  1.46it/s]
 10%|β–‰         | 154/1563 [01:44<17:15,  1.36it/s]
 10%|β–‰         | 155/1563 [01:44<18:07,  1.29it/s]
 10%|β–‰         | 156/1563 [01:45<17:36,  1.33it/s]
 10%|β–ˆ         | 157/1563 [01:46<15:41,  1.49it/s]
 10%|β–ˆ         | 158/1563 [01:46<14:06,  1.66it/s]
 10%|β–ˆ         | 159/1563 [01:47<14:46,  1.58it/s]
 10%|β–ˆ         | 160/1563 [01:47<14:27,  1.62it/s]
 10%|β–ˆ         | 161/1563 [01:48<15:17,  1.53it/s]
 10%|β–ˆ         | 162/1563 [01:49<16:40,  1.40it/s]
 10%|β–ˆ         | 163/1563 [01:49<14:47,  1.58it/s]
 10%|β–ˆ         | 164/1563 [01:50<16:13,  1.44it/s]
 11%|β–ˆ         | 165/1563 [01:51<15:58,  1.46it/s]
 11%|β–ˆ         | 166/1563 [01:51<14:34,  1.60it/s]
 11%|β–ˆ         | 167/1563 [01:52<14:04,  1.65it/s]
 11%|β–ˆ         | 168/1563 [01:53<15:45,  1.47it/s]
 11%|β–ˆ         | 169/1563 [01:53<15:24,  1.51it/s]
 11%|β–ˆ         | 170/1563 [01:54<15:21,  1.51it/s]
 11%|β–ˆ         | 171/1563 [01:55<15:33,  1.49it/s]
 11%|β–ˆ         | 172/1563 [01:55<14:48,  1.57it/s]
 11%|β–ˆ         | 173/1563 [01:56<15:25,  1.50it/s]
 11%|β–ˆ         | 174/1563 [01:57<16:04,  1.44it/s]
 11%|β–ˆ         | 175/1563 [01:58<16:10,  1.43it/s]
 11%|β–ˆβ–        | 176/1563 [01:58<16:49,  1.37it/s]
 11%|β–ˆβ–        | 177/1563 [01:59<15:18,  1.51it/s]
 11%|β–ˆβ–        | 178/1563 [01:59<15:08,  1.52it/s]
 11%|β–ˆβ–        | 179/1563 [02:00<14:00,  1.65it/s]
 12%|β–ˆβ–        | 180/1563 [02:01<15:37,  1.48it/s]
 12%|β–ˆβ–        | 181/1563 [02:02<16:46,  1.37it/s]
 12%|β–ˆβ–        | 182/1563 [02:02<15:15,  1.51it/s]
 12%|β–ˆβ–        | 183/1563 [02:03<14:02,  1.64it/s]
 12%|β–ˆβ–        | 184/1563 [02:03<14:31,  1.58it/s]
 12%|β–ˆβ–        | 185/1563 [02:04<15:28,  1.48it/s]
 12%|β–ˆβ–        | 186/1563 [02:05<14:19,  1.60it/s]
 12%|β–ˆβ–        | 187/1563 [02:05<15:10,  1.51it/s]
 12%|β–ˆβ–        | 188/1563 [02:06<16:38,  1.38it/s]
 12%|β–ˆβ–        | 189/1563 [02:07<17:33,  1.30it/s]
 12%|β–ˆβ–        | 190/1563 [02:08<17:05,  1.34it/s]
 12%|β–ˆβ–        | 191/1563 [02:08<16:31,  1.38it/s]
 12%|β–ˆβ–        | 192/1563 [02:09<15:33,  1.47it/s]
 12%|β–ˆβ–        | 193/1563 [02:10<16:42,  1.37it/s]
 12%|β–ˆβ–        | 194/1563 [02:10<14:28,  1.58it/s]
 12%|β–ˆβ–        | 195/1563 [02:11<14:44,  1.55it/s]
 13%|β–ˆβ–Ž        | 196/1563 [02:12<15:35,  1.46it/s]
 13%|β–ˆβ–Ž        | 197/1563 [02:13<16:33,  1.38it/s]
 13%|β–ˆβ–Ž        | 198/1563 [02:13<16:19,  1.39it/s]
 13%|β–ˆβ–Ž        | 199/1563 [02:14<15:22,  1.48it/s]
 13%|β–ˆβ–Ž        | 200/1563 [02:15<15:31,  1.46it/s]
                                                  
{'loss': 0.2004, 'grad_norm': 15.75, 'learning_rate': 1.7453614843250163e-05, 'epoch': 0.13}

 13%|β–ˆβ–Ž        | 200/1563 [02:15<15:31,  1.46it/s]
 13%|β–ˆβ–Ž        | 201/1563 [02:15<15:57,  1.42it/s]
 13%|β–ˆβ–Ž        | 202/1563 [02:16<13:55,  1.63it/s]
 13%|β–ˆβ–Ž        | 203/1563 [02:17<15:24,  1.47it/s]
 13%|β–ˆβ–Ž        | 204/1563 [02:17<15:31,  1.46it/s]
 13%|β–ˆβ–Ž        | 205/1563 [02:18<14:52,  1.52it/s]
 13%|β–ˆβ–Ž        | 206/1563 [02:19<16:07,  1.40it/s]
 13%|β–ˆβ–Ž        | 207/1563 [02:19<15:53,  1.42it/s]
 13%|β–ˆβ–Ž        | 208/1563 [02:20<14:59,  1.51it/s]
 13%|β–ˆβ–Ž        | 209/1563 [02:21<16:04,  1.40it/s]
 13%|β–ˆβ–Ž        | 210/1563 [02:22<16:47,  1.34it/s]
 13%|β–ˆβ–Ž        | 211/1563 [02:22<15:35,  1.44it/s]
 14%|β–ˆβ–Ž        | 212/1563 [02:23<14:01,  1.60it/s]
 14%|β–ˆβ–Ž        | 213/1563 [02:23<12:59,  1.73it/s]
 14%|β–ˆβ–Ž        | 214/1563 [02:24<14:01,  1.60it/s]
 14%|β–ˆβ–        | 215/1563 [02:25<19:18,  1.16it/s]
 14%|β–ˆβ–        | 216/1563 [02:26<16:14,  1.38it/s]
 14%|β–ˆβ–        | 217/1563 [02:26<14:18,  1.57it/s]
 14%|β–ˆβ–        | 218/1563 [02:26<12:54,  1.74it/s]
 14%|β–ˆβ–        | 219/1563 [02:27<13:47,  1.62it/s]
 14%|β–ˆβ–        | 220/1563 [02:28<16:03,  1.39it/s]
 14%|β–ˆβ–        | 221/1563 [02:29<14:43,  1.52it/s]
 14%|β–ˆβ–        | 222/1563 [02:30<16:01,  1.39it/s]
 14%|β–ˆβ–        | 223/1563 [02:30<14:35,  1.53it/s]
 14%|β–ˆβ–        | 224/1563 [02:31<15:52,  1.41it/s]
 14%|β–ˆβ–        | 225/1563 [02:32<15:35,  1.43it/s]
 14%|β–ˆβ–        | 226/1563 [02:32<14:46,  1.51it/s]
 15%|β–ˆβ–        | 227/1563 [02:33<14:25,  1.54it/s]
 15%|β–ˆβ–        | 228/1563 [02:33<14:36,  1.52it/s]
 15%|β–ˆβ–        | 229/1563 [02:34<15:21,  1.45it/s]
 15%|β–ˆβ–        | 230/1563 [02:35<15:52,  1.40it/s]
 15%|β–ˆβ–        | 231/1563 [02:36<16:19,  1.36it/s]
 15%|β–ˆβ–        | 232/1563 [02:36<15:32,  1.43it/s]
 15%|β–ˆβ–        | 233/1563 [02:37<16:35,  1.34it/s]
 15%|β–ˆβ–        | 234/1563 [02:38<16:09,  1.37it/s]
 15%|β–ˆβ–Œ        | 235/1563 [02:39<16:59,  1.30it/s]
 15%|β–ˆβ–Œ        | 236/1563 [02:39<16:18,  1.36it/s]
 15%|β–ˆβ–Œ        | 237/1563 [02:40<16:31,  1.34it/s]
 15%|β–ˆβ–Œ        | 238/1563 [02:41<14:31,  1.52it/s]
 15%|β–ˆβ–Œ        | 239/1563 [02:41<15:41,  1.41it/s]
 15%|β–ˆβ–Œ        | 240/1563 [02:42<15:36,  1.41it/s]
 15%|β–ˆβ–Œ        | 241/1563 [02:43<14:45,  1.49it/s]
 15%|β–ˆβ–Œ        | 242/1563 [02:43<13:03,  1.69it/s]
 16%|β–ˆβ–Œ        | 243/1563 [02:44<14:04,  1.56it/s]
 16%|β–ˆβ–Œ        | 244/1563 [02:45<14:25,  1.52it/s]
 16%|β–ˆβ–Œ        | 245/1563 [02:45<13:26,  1.63it/s]
 16%|β–ˆβ–Œ        | 246/1563 [02:46<15:05,  1.45it/s]
 16%|β–ˆβ–Œ        | 247/1563 [02:47<15:54,  1.38it/s]
 16%|β–ˆβ–Œ        | 248/1563 [02:48<15:49,  1.38it/s]
 16%|β–ˆβ–Œ        | 249/1563 [02:48<13:53,  1.58it/s]
 16%|β–ˆβ–Œ        | 250/1563 [02:49<13:57,  1.57it/s]
                                                  
{'loss': 0.1715, 'grad_norm': 46.75, 'learning_rate': 1.6813819577735126e-05, 'epoch': 0.16}

 16%|β–ˆβ–Œ        | 250/1563 [02:49<13:57,  1.57it/s]
 16%|β–ˆβ–Œ        | 251/1563 [02:49<12:42,  1.72it/s]
 16%|β–ˆβ–Œ        | 252/1563 [02:50<12:43,  1.72it/s]
 16%|β–ˆβ–Œ        | 253/1563 [02:50<14:31,  1.50it/s]
 16%|β–ˆβ–‹        | 254/1563 [02:51<13:25,  1.63it/s]
 16%|β–ˆβ–‹        | 255/1563 [02:52<13:25,  1.62it/s]
 16%|β–ˆβ–‹        | 256/1563 [02:52<13:46,  1.58it/s]
 16%|β–ˆβ–‹        | 257/1563 [02:53<15:01,  1.45it/s]
 17%|β–ˆβ–‹        | 258/1563 [02:54<13:35,  1.60it/s]
 17%|β–ˆβ–‹        | 259/1563 [02:54<12:30,  1.74it/s]
 17%|β–ˆβ–‹        | 260/1563 [02:55<12:13,  1.78it/s]
 17%|β–ˆβ–‹        | 261/1563 [02:55<13:58,  1.55it/s]
 17%|β–ˆβ–‹        | 262/1563 [02:56<12:48,  1.69it/s]
 17%|β–ˆβ–‹        | 263/1563 [02:56<11:57,  1.81it/s]
 17%|β–ˆβ–‹        | 264/1563 [02:57<11:28,  1.89it/s]
 17%|β–ˆβ–‹        | 265/1563 [02:57<11:35,  1.87it/s]
 17%|β–ˆβ–‹        | 266/1563 [02:58<12:33,  1.72it/s]
 17%|β–ˆβ–‹        | 267/1563 [02:59<13:18,  1.62it/s]
 17%|β–ˆβ–‹        | 268/1563 [02:59<13:27,  1.60it/s]
 17%|β–ˆβ–‹        | 269/1563 [03:00<14:45,  1.46it/s]
 17%|β–ˆβ–‹        | 270/1563 [03:01<13:20,  1.62it/s]
 17%|β–ˆβ–‹        | 271/1563 [03:01<14:19,  1.50it/s]
 17%|β–ˆβ–‹        | 272/1563 [03:02<14:26,  1.49it/s]
 17%|β–ˆβ–‹        | 273/1563 [03:03<14:37,  1.47it/s]
 18%|β–ˆβ–Š        | 274/1563 [03:03<13:12,  1.63it/s]
 18%|β–ˆβ–Š        | 275/1563 [03:04<13:28,  1.59it/s]
 18%|β–ˆβ–Š        | 276/1563 [03:04<12:40,  1.69it/s]
 18%|β–ˆβ–Š        | 277/1563 [03:05<13:59,  1.53it/s]
 18%|β–ˆβ–Š        | 278/1563 [03:06<15:20,  1.40it/s]
 18%|β–ˆβ–Š        | 279/1563 [03:07<15:50,  1.35it/s]
 18%|β–ˆβ–Š        | 280/1563 [03:08<14:48,  1.44it/s]
 18%|β–ˆβ–Š        | 281/1563 [03:08<15:05,  1.42it/s]
 18%|β–ˆβ–Š        | 282/1563 [03:09<14:52,  1.44it/s]
 18%|β–ˆβ–Š        | 283/1563 [03:10<14:43,  1.45it/s]
 18%|β–ˆβ–Š        | 284/1563 [03:10<14:45,  1.44it/s]
 18%|β–ˆβ–Š        | 285/1563 [03:11<15:20,  1.39it/s]
 18%|β–ˆβ–Š        | 286/1563 [03:12<13:54,  1.53it/s]
 18%|β–ˆβ–Š        | 287/1563 [03:12<15:09,  1.40it/s]
 18%|β–ˆβ–Š        | 288/1563 [03:13<15:47,  1.35it/s]
 18%|β–ˆβ–Š        | 289/1563 [03:14<16:30,  1.29it/s]
 19%|β–ˆβ–Š        | 290/1563 [03:15<14:38,  1.45it/s]
 19%|β–ˆβ–Š        | 291/1563 [03:15<14:59,  1.41it/s]
 19%|β–ˆβ–Š        | 292/1563 [03:16<15:28,  1.37it/s]
 19%|β–ˆβ–Š        | 293/1563 [03:17<16:18,  1.30it/s]
 19%|β–ˆβ–‰        | 294/1563 [03:18<16:46,  1.26it/s]
 19%|β–ˆβ–‰        | 295/1563 [03:19<17:08,  1.23it/s]
 19%|β–ˆβ–‰        | 296/1563 [03:19<16:56,  1.25it/s]
 19%|β–ˆβ–‰        | 297/1563 [03:20<17:15,  1.22it/s]
 19%|β–ˆβ–‰        | 298/1563 [03:21<16:55,  1.25it/s]
 19%|β–ˆβ–‰        | 299/1563 [03:22<16:52,  1.25it/s]
 19%|β–ˆβ–‰        | 300/1563 [03:23<16:56,  1.24it/s]
                                                  
{'loss': 0.1797, 'grad_norm': 426.0, 'learning_rate': 1.6174024312220092e-05, 'epoch': 0.19}

 19%|β–ˆβ–‰        | 300/1563 [03:23<16:56,  1.24it/s]
 19%|β–ˆβ–‰        | 301/1563 [03:23<16:20,  1.29it/s]
 19%|β–ˆβ–‰        | 302/1563 [03:24<14:23,  1.46it/s]
 19%|β–ˆβ–‰        | 303/1563 [03:24<13:27,  1.56it/s]
 19%|β–ˆβ–‰        | 304/1563 [03:25<14:08,  1.48it/s]
 20%|β–ˆβ–‰        | 305/1563 [03:26<14:31,  1.44it/s]
 20%|β–ˆβ–‰        | 306/1563 [03:26<13:54,  1.51it/s]
 20%|β–ˆβ–‰        | 307/1563 [03:27<14:57,  1.40it/s]
 20%|β–ˆβ–‰        | 308/1563 [03:28<13:29,  1.55it/s]
 20%|β–ˆβ–‰        | 309/1563 [03:29<14:32,  1.44it/s]
 20%|β–ˆβ–‰        | 310/1563 [03:29<13:09,  1.59it/s]
 20%|β–ˆβ–‰        | 311/1563 [03:30<12:50,  1.63it/s]
 20%|β–ˆβ–‰        | 312/1563 [03:31<14:15,  1.46it/s]
 20%|β–ˆβ–ˆ        | 313/1563 [03:31<13:35,  1.53it/s]
 20%|β–ˆβ–ˆ        | 314/1563 [03:32<14:42,  1.42it/s]
 20%|β–ˆβ–ˆ        | 315/1563 [03:33<13:56,  1.49it/s]
 20%|β–ˆβ–ˆ        | 316/1563 [03:33<12:51,  1.62it/s]
 20%|β–ˆβ–ˆ        | 317/1563 [03:34<13:17,  1.56it/s]
 20%|β–ˆβ–ˆ        | 318/1563 [03:34<12:32,  1.65it/s]
 20%|β–ˆβ–ˆ        | 319/1563 [03:35<12:54,  1.61it/s]
 20%|β–ˆβ–ˆ        | 320/1563 [03:36<14:20,  1.45it/s]
 21%|β–ˆβ–ˆ        | 321/1563 [03:36<14:23,  1.44it/s]
 21%|β–ˆβ–ˆ        | 322/1563 [03:37<15:01,  1.38it/s]
 21%|β–ˆβ–ˆ        | 323/1563 [03:38<14:33,  1.42it/s]
 21%|β–ˆβ–ˆ        | 324/1563 [03:39<14:38,  1.41it/s]
 21%|β–ˆβ–ˆ        | 325/1563 [03:39<14:12,  1.45it/s]
 21%|β–ˆβ–ˆ        | 326/1563 [03:40<14:42,  1.40it/s]
 21%|β–ˆβ–ˆ        | 327/1563 [03:41<15:25,  1.34it/s]
 21%|β–ˆβ–ˆ        | 328/1563 [03:41<13:30,  1.52it/s]
 21%|β–ˆβ–ˆ        | 329/1563 [03:42<14:33,  1.41it/s]
 21%|β–ˆβ–ˆ        | 330/1563 [03:43<13:58,  1.47it/s]
 21%|β–ˆβ–ˆ        | 331/1563 [03:43<12:31,  1.64it/s]
 21%|β–ˆβ–ˆ        | 332/1563 [03:44<11:24,  1.80it/s]
 21%|β–ˆβ–ˆβ–       | 333/1563 [03:44<12:48,  1.60it/s]
 21%|β–ˆβ–ˆβ–       | 334/1563 [03:45<14:10,  1.44it/s]
 21%|β–ˆβ–ˆβ–       | 335/1563 [03:46<12:57,  1.58it/s]
 21%|β–ˆβ–ˆβ–       | 336/1563 [03:47<14:23,  1.42it/s]
 22%|β–ˆβ–ˆβ–       | 337/1563 [03:47<13:23,  1.53it/s]
 22%|β–ˆβ–ˆβ–       | 338/1563 [03:48<14:32,  1.40it/s]
 22%|β–ˆβ–ˆβ–       | 339/1563 [03:49<14:30,  1.41it/s]
 22%|β–ˆβ–ˆβ–       | 340/1563 [03:49<13:30,  1.51it/s]
 22%|β–ˆβ–ˆβ–       | 341/1563 [03:50<14:45,  1.38it/s]
 22%|β–ˆβ–ˆβ–       | 342/1563 [03:51<15:33,  1.31it/s]
 22%|β–ˆβ–ˆβ–       | 343/1563 [03:52<15:00,  1.35it/s]
 22%|β–ˆβ–ˆβ–       | 344/1563 [03:52<13:22,  1.52it/s]
 22%|β–ˆβ–ˆβ–       | 345/1563 [03:53<14:24,  1.41it/s]
 22%|β–ˆβ–ˆβ–       | 346/1563 [03:54<14:29,  1.40it/s]
 22%|β–ˆβ–ˆβ–       | 347/1563 [03:55<15:21,  1.32it/s]
 22%|β–ˆβ–ˆβ–       | 348/1563 [03:55<15:05,  1.34it/s]
 22%|β–ˆβ–ˆβ–       | 349/1563 [03:56<15:40,  1.29it/s]
 22%|β–ˆβ–ˆβ–       | 350/1563 [03:57<15:11,  1.33it/s]
                                                  
{'loss': 0.1763, 'grad_norm': 15.5, 'learning_rate': 1.5534229046705055e-05, 'epoch': 0.22}

 22%|β–ˆβ–ˆβ–       | 350/1563 [03:57<15:11,  1.33it/s]
 22%|β–ˆβ–ˆβ–       | 351/1563 [03:58<15:22,  1.31it/s]
 23%|β–ˆβ–ˆβ–Ž       | 352/1563 [03:58<15:50,  1.27it/s]
 23%|β–ˆβ–ˆβ–Ž       | 353/1563 [03:59<16:13,  1.24it/s]
 23%|β–ˆβ–ˆβ–Ž       | 354/1563 [04:00<16:26,  1.23it/s]
 23%|β–ˆβ–ˆβ–Ž       | 355/1563 [04:01<16:10,  1.24it/s]
 23%|β–ˆβ–ˆβ–Ž       | 356/1563 [04:02<15:51,  1.27it/s]
 23%|β–ˆβ–ˆβ–Ž       | 357/1563 [04:02<14:15,  1.41it/s]
 23%|β–ˆβ–ˆβ–Ž       | 358/1563 [04:03<15:13,  1.32it/s]
 23%|β–ˆβ–ˆβ–Ž       | 359/1563 [04:04<14:31,  1.38it/s]
 23%|β–ˆβ–ˆβ–Ž       | 360/1563 [04:05<15:00,  1.34it/s]
 23%|β–ˆβ–ˆβ–Ž       | 361/1563 [04:05<15:14,  1.31it/s]
 23%|β–ˆβ–ˆβ–Ž       | 362/1563 [04:06<15:42,  1.27it/s]
 23%|β–ˆβ–ˆβ–Ž       | 363/1563 [04:07<14:53,  1.34it/s]
 23%|β–ˆβ–ˆβ–Ž       | 364/1563 [04:07<13:19,  1.50it/s]
 23%|β–ˆβ–ˆβ–Ž       | 365/1563 [04:08<12:23,  1.61it/s]
 23%|β–ˆβ–ˆβ–Ž       | 366/1563 [04:09<13:18,  1.50it/s]
 23%|β–ˆβ–ˆβ–Ž       | 367/1563 [04:09<13:40,  1.46it/s]
 24%|β–ˆβ–ˆβ–Ž       | 368/1563 [04:10<14:45,  1.35it/s]
 24%|β–ˆβ–ˆβ–Ž       | 369/1563 [04:11<13:10,  1.51it/s]
 24%|β–ˆβ–ˆβ–Ž       | 370/1563 [04:11<11:57,  1.66it/s]
 24%|β–ˆβ–ˆβ–Ž       | 371/1563 [04:12<10:57,  1.81it/s]
 24%|β–ˆβ–ˆβ–       | 372/1563 [04:12<12:48,  1.55it/s]
 24%|β–ˆβ–ˆβ–       | 373/1563 [04:13<11:45,  1.69it/s]
 24%|β–ˆβ–ˆβ–       | 374/1563 [04:14<13:16,  1.49it/s]
 24%|β–ˆβ–ˆβ–       | 375/1563 [04:15<14:10,  1.40it/s]
 24%|β–ˆβ–ˆβ–       | 376/1563 [04:15<12:36,  1.57it/s]
 24%|β–ˆβ–ˆβ–       | 377/1563 [04:15<11:27,  1.73it/s]
 24%|β–ˆβ–ˆβ–       | 378/1563 [04:16<10:35,  1.87it/s]
 24%|β–ˆβ–ˆβ–       | 379/1563 [04:17<11:48,  1.67it/s]
 24%|β–ˆβ–ˆβ–       | 380/1563 [04:17<12:16,  1.61it/s]
 24%|β–ˆβ–ˆβ–       | 381/1563 [04:18<13:34,  1.45it/s]
 24%|β–ˆβ–ˆβ–       | 382/1563 [04:19<14:14,  1.38it/s]
 25%|β–ˆβ–ˆβ–       | 383/1563 [04:20<14:51,  1.32it/s]
 25%|β–ˆβ–ˆβ–       | 384/1563 [04:20<13:13,  1.49it/s]
 25%|β–ˆβ–ˆβ–       | 385/1563 [04:21<13:22,  1.47it/s]
 25%|β–ˆβ–ˆβ–       | 386/1563 [04:22<12:39,  1.55it/s]
 25%|β–ˆβ–ˆβ–       | 387/1563 [04:22<12:14,  1.60it/s]
 25%|β–ˆβ–ˆβ–       | 388/1563 [04:23<13:11,  1.48it/s]
 25%|β–ˆβ–ˆβ–       | 389/1563 [04:23<12:14,  1.60it/s]
 25%|β–ˆβ–ˆβ–       | 390/1563 [04:24<12:42,  1.54it/s]
 25%|β–ˆβ–ˆβ–Œ       | 391/1563 [04:25<11:47,  1.66it/s]
 25%|β–ˆβ–ˆβ–Œ       | 392/1563 [04:25<13:14,  1.47it/s]
 25%|β–ˆβ–ˆβ–Œ       | 393/1563 [04:26<14:07,  1.38it/s]
 25%|β–ˆβ–ˆβ–Œ       | 394/1563 [04:27<14:41,  1.33it/s]
 25%|β–ˆβ–ˆβ–Œ       | 395/1563 [04:28<13:05,  1.49it/s]
 25%|β–ˆβ–ˆβ–Œ       | 396/1563 [04:28<13:10,  1.48it/s]
 25%|β–ˆβ–ˆβ–Œ       | 397/1563 [04:29<12:32,  1.55it/s]
 25%|β–ˆβ–ˆβ–Œ       | 398/1563 [04:29<11:35,  1.67it/s]
 26%|β–ˆβ–ˆβ–Œ       | 399/1563 [04:30<13:00,  1.49it/s]
 26%|β–ˆβ–ˆβ–Œ       | 400/1563 [04:31<12:21,  1.57it/s]
                                                  
{'loss': 0.161, 'grad_norm': 11.1875, 'learning_rate': 1.4894433781190021e-05, 'epoch': 0.26}

 26%|β–ˆβ–ˆβ–Œ       | 400/1563 [04:31<12:21,  1.57it/s]
 26%|β–ˆβ–ˆβ–Œ       | 401/1563 [04:32<13:32,  1.43it/s]
 26%|β–ˆβ–ˆβ–Œ       | 402/1563 [04:32<12:17,  1.57it/s]
 26%|β–ˆβ–ˆβ–Œ       | 403/1563 [04:33<12:32,  1.54it/s]
 26%|β–ˆβ–ˆβ–Œ       | 404/1563 [04:33<11:44,  1.65it/s]
 26%|β–ˆβ–ˆβ–Œ       | 405/1563 [04:34<12:05,  1.60it/s]
 26%|β–ˆβ–ˆβ–Œ       | 406/1563 [04:35<13:24,  1.44it/s]
 26%|β–ˆβ–ˆβ–Œ       | 407/1563 [04:36<14:06,  1.37it/s]
 26%|β–ˆβ–ˆβ–Œ       | 408/1563 [04:36<14:51,  1.30it/s]
 26%|β–ˆβ–ˆβ–Œ       | 409/1563 [04:37<13:27,  1.43it/s]
 26%|β–ˆβ–ˆβ–Œ       | 410/1563 [04:38<14:16,  1.35it/s]
 26%|β–ˆβ–ˆβ–‹       | 411/1563 [04:38<12:56,  1.48it/s]
 26%|β–ˆβ–ˆβ–‹       | 412/1563 [04:39<11:49,  1.62it/s]
 26%|β–ˆβ–ˆβ–‹       | 413/1563 [04:40<12:49,  1.50it/s]
 26%|β–ˆβ–ˆβ–‹       | 414/1563 [04:40<11:28,  1.67it/s]
 27%|β–ˆβ–ˆβ–‹       | 415/1563 [04:41<11:20,  1.69it/s]
 27%|β–ˆβ–ˆβ–‹       | 416/1563 [04:41<10:38,  1.80it/s]
 27%|β–ˆβ–ˆβ–‹       | 417/1563 [04:42<12:18,  1.55it/s]
 27%|β–ˆβ–ˆβ–‹       | 418/1563 [04:43<13:24,  1.42it/s]
 27%|β–ˆβ–ˆβ–‹       | 419/1563 [04:44<14:15,  1.34it/s]
 27%|β–ˆβ–ˆβ–‹       | 420/1563 [04:44<14:23,  1.32it/s]
 27%|β–ˆβ–ˆβ–‹       | 421/1563 [04:45<13:54,  1.37it/s]
 27%|β–ˆβ–ˆβ–‹       | 422/1563 [04:46<14:35,  1.30it/s]
 27%|β–ˆβ–ˆβ–‹       | 423/1563 [04:46<12:49,  1.48it/s]
 27%|β–ˆβ–ˆβ–‹       | 424/1563 [04:47<11:53,  1.60it/s]
 27%|β–ˆβ–ˆβ–‹       | 425/1563 [04:48<13:07,  1.44it/s]
 27%|β–ˆβ–ˆβ–‹       | 426/1563 [04:49<14:02,  1.35it/s]
 27%|β–ˆβ–ˆβ–‹       | 427/1563 [04:49<12:18,  1.54it/s]
 27%|β–ˆβ–ˆβ–‹       | 428/1563 [04:50<13:15,  1.43it/s]
 27%|β–ˆβ–ˆβ–‹       | 429/1563 [04:51<13:03,  1.45it/s]
 28%|β–ˆβ–ˆβ–Š       | 430/1563 [04:51<13:23,  1.41it/s]
 28%|β–ˆβ–ˆβ–Š       | 431/1563 [04:52<12:00,  1.57it/s]
 28%|β–ˆβ–ˆβ–Š       | 432/1563 [04:52<10:46,  1.75it/s]
 28%|β–ˆβ–ˆβ–Š       | 433/1563 [04:53<10:25,  1.81it/s]
 28%|β–ˆβ–ˆβ–Š       | 434/1563 [04:53<10:12,  1.84it/s]
 28%|β–ˆβ–ˆβ–Š       | 435/1563 [04:54<10:20,  1.82it/s]
 28%|β–ˆβ–ˆβ–Š       | 436/1563 [04:55<11:52,  1.58it/s]
 28%|β–ˆβ–ˆβ–Š       | 437/1563 [04:55<11:02,  1.70it/s]
 28%|β–ˆβ–ˆβ–Š       | 438/1563 [04:56<11:45,  1.60it/s]
 28%|β–ˆβ–ˆβ–Š       | 439/1563 [04:56<11:15,  1.66it/s]
 28%|β–ˆβ–ˆβ–Š       | 440/1563 [04:57<10:50,  1.73it/s]
 28%|β–ˆβ–ˆβ–Š       | 441/1563 [04:57<10:17,  1.82it/s]
 28%|β–ˆβ–ˆβ–Š       | 442/1563 [04:58<09:41,  1.93it/s]
 28%|β–ˆβ–ˆβ–Š       | 443/1563 [04:58<09:28,  1.97it/s]
 28%|β–ˆβ–ˆβ–Š       | 444/1563 [04:59<09:49,  1.90it/s]
 28%|β–ˆβ–ˆβ–Š       | 445/1563 [05:00<11:30,  1.62it/s]
 29%|β–ˆβ–ˆβ–Š       | 446/1563 [05:00<12:09,  1.53it/s]
 29%|β–ˆβ–ˆβ–Š       | 447/1563 [05:01<12:46,  1.46it/s]
 29%|β–ˆβ–ˆβ–Š       | 448/1563 [05:02<12:56,  1.44it/s]
 29%|β–ˆβ–ˆβ–Š       | 449/1563 [05:02<12:07,  1.53it/s]
 29%|β–ˆβ–ˆβ–‰       | 450/1563 [05:03<12:29,  1.48it/s]
                                                  
{'loss': 0.1574, 'grad_norm': 10.0, 'learning_rate': 1.4254638515674986e-05, 'epoch': 0.29}

 29%|β–ˆβ–ˆβ–‰       | 450/1563 [05:03<12:29,  1.48it/s]
 29%|β–ˆβ–ˆβ–‰       | 451/1563 [05:04<11:21,  1.63it/s]
 29%|β–ˆβ–ˆβ–‰       | 452/1563 [05:04<11:50,  1.56it/s]
 29%|β–ˆβ–ˆβ–‰       | 453/1563 [05:05<10:48,  1.71it/s]
 29%|β–ˆβ–ˆβ–‰       | 454/1563 [05:05<11:01,  1.68it/s]
 29%|β–ˆβ–ˆβ–‰       | 455/1563 [05:06<10:35,  1.74it/s]
 29%|β–ˆβ–ˆβ–‰       | 456/1563 [05:06<10:06,  1.83it/s]
 29%|β–ˆβ–ˆβ–‰       | 457/1563 [05:07<09:33,  1.93it/s]
 29%|β–ˆβ–ˆβ–‰       | 458/1563 [05:07<09:13,  2.00it/s]
 29%|β–ˆβ–ˆβ–‰       | 459/1563 [05:08<11:04,  1.66it/s]
 29%|β–ˆβ–ˆβ–‰       | 460/1563 [05:09<12:27,  1.48it/s]
 29%|β–ˆβ–ˆβ–‰       | 461/1563 [05:09<11:10,  1.64it/s]
 30%|β–ˆβ–ˆβ–‰       | 462/1563 [05:10<12:31,  1.46it/s]
 30%|β–ˆβ–ˆβ–‰       | 463/1563 [05:11<11:06,  1.65it/s]
 30%|β–ˆβ–ˆβ–‰       | 464/1563 [05:11<10:13,  1.79it/s]
 30%|β–ˆβ–ˆβ–‰       | 465/1563 [05:12<11:23,  1.61it/s]
 30%|β–ˆβ–ˆβ–‰       | 466/1563 [05:13<12:42,  1.44it/s]
 30%|β–ˆβ–ˆβ–‰       | 467/1563 [05:13<11:45,  1.55it/s]
 30%|β–ˆβ–ˆβ–‰       | 468/1563 [05:14<10:42,  1.70it/s]
 30%|β–ˆβ–ˆβ–ˆ       | 469/1563 [05:15<12:05,  1.51it/s]
 30%|β–ˆβ–ˆβ–ˆ       | 470/1563 [05:15<11:09,  1.63it/s]
 30%|β–ˆβ–ˆβ–ˆ       | 471/1563 [05:16<10:37,  1.71it/s]
 30%|β–ˆβ–ˆβ–ˆ       | 472/1563 [05:16<09:57,  1.83it/s]
 30%|β–ˆβ–ˆβ–ˆ       | 473/1563 [05:17<11:34,  1.57it/s]
 30%|β–ˆβ–ˆβ–ˆ       | 474/1563 [05:17<10:22,  1.75it/s]
 30%|β–ˆβ–ˆβ–ˆ       | 475/1563 [05:18<10:05,  1.80it/s]
 30%|β–ˆβ–ˆβ–ˆ       | 476/1563 [05:18<09:17,  1.95it/s]
 31%|β–ˆβ–ˆβ–ˆ       | 477/1563 [05:19<09:16,  1.95it/s]
 31%|β–ˆβ–ˆβ–ˆ       | 478/1563 [05:19<08:50,  2.05it/s]
 31%|β–ˆβ–ˆβ–ˆ       | 479/1563 [05:20<09:21,  1.93it/s]
 31%|β–ˆβ–ˆβ–ˆ       | 480/1563 [05:21<11:09,  1.62it/s]
 31%|β–ˆβ–ˆβ–ˆ       | 481/1563 [05:21<11:53,  1.52it/s]
 31%|β–ˆβ–ˆβ–ˆ       | 482/1563 [05:22<10:46,  1.67it/s]
 31%|β–ˆβ–ˆβ–ˆ       | 483/1563 [05:23<11:17,  1.59it/s]
 31%|β–ˆβ–ˆβ–ˆ       | 484/1563 [05:23<10:59,  1.64it/s]
 31%|β–ˆβ–ˆβ–ˆ       | 485/1563 [05:24<10:42,  1.68it/s]
 31%|β–ˆβ–ˆβ–ˆ       | 486/1563 [05:24<09:43,  1.84it/s]
 31%|β–ˆβ–ˆβ–ˆ       | 487/1563 [05:25<11:05,  1.62it/s]
 31%|β–ˆβ–ˆβ–ˆ       | 488/1563 [05:26<11:55,  1.50it/s]
 31%|β–ˆβ–ˆβ–ˆβ–      | 489/1563 [05:27<12:38,  1.42it/s]
 31%|β–ˆβ–ˆβ–ˆβ–      | 490/1563 [05:27<13:14,  1.35it/s]
 31%|β–ˆβ–ˆβ–ˆβ–      | 491/1563 [05:28<11:58,  1.49it/s]
 31%|β–ˆβ–ˆβ–ˆβ–      | 492/1563 [05:28<11:14,  1.59it/s]
 32%|β–ˆβ–ˆβ–ˆβ–      | 493/1563 [05:29<10:24,  1.71it/s]
 32%|β–ˆβ–ˆβ–ˆβ–      | 494/1563 [05:30<11:29,  1.55it/s]
 32%|β–ˆβ–ˆβ–ˆβ–      | 495/1563 [05:30<10:40,  1.67it/s]
 32%|β–ˆβ–ˆβ–ˆβ–      | 496/1563 [05:31<11:57,  1.49it/s]
 32%|β–ˆβ–ˆβ–ˆβ–      | 497/1563 [05:31<10:45,  1.65it/s]
 32%|β–ˆβ–ˆβ–ˆβ–      | 498/1563 [05:32<10:50,  1.64it/s]
 32%|β–ˆβ–ˆβ–ˆβ–      | 499/1563 [05:33<11:16,  1.57it/s]
 32%|β–ˆβ–ˆβ–ˆβ–      | 500/1563 [05:34<12:29,  1.42it/s]
                                                  
{'loss': 0.15, 'grad_norm': 14.0625, 'learning_rate': 1.361484325015995e-05, 'epoch': 0.32}

 32%|β–ˆβ–ˆβ–ˆβ–      | 500/1563 [05:34<12:29,  1.42it/s]
 32%|β–ˆβ–ˆβ–ˆβ–      | 501/1563 [05:34<12:59,  1.36it/s]
 32%|β–ˆβ–ˆβ–ˆβ–      | 502/1563 [05:35<11:37,  1.52it/s]
 32%|β–ˆβ–ˆβ–ˆβ–      | 503/1563 [05:35<10:59,  1.61it/s]
 32%|β–ˆβ–ˆβ–ˆβ–      | 504/1563 [05:36<11:44,  1.50it/s]
 32%|β–ˆβ–ˆβ–ˆβ–      | 505/1563 [05:37<12:49,  1.38it/s]
 32%|β–ˆβ–ˆβ–ˆβ–      | 506/1563 [05:38<13:18,  1.32it/s]
 32%|β–ˆβ–ˆβ–ˆβ–      | 507/1563 [05:39<13:42,  1.28it/s]
 33%|β–ˆβ–ˆβ–ˆβ–Ž      | 508/1563 [05:40<13:45,  1.28it/s]
 33%|β–ˆβ–ˆβ–ˆβ–Ž      | 509/1563 [05:40<14:00,  1.25it/s]
 33%|β–ˆβ–ˆβ–ˆβ–Ž      | 510/1563 [05:41<12:55,  1.36it/s]
 33%|β–ˆβ–ˆβ–ˆβ–Ž      | 511/1563 [05:42<11:46,  1.49it/s]
 33%|β–ˆβ–ˆβ–ˆβ–Ž      | 512/1563 [05:42<10:55,  1.60it/s]
 33%|β–ˆβ–ˆβ–ˆβ–Ž      | 513/1563 [05:43<10:57,  1.60it/s]
 33%|β–ˆβ–ˆβ–ˆβ–Ž      | 514/1563 [05:43<10:52,  1.61it/s]
 33%|β–ˆβ–ˆβ–ˆβ–Ž      | 515/1563 [05:44<10:08,  1.72it/s]
 33%|β–ˆβ–ˆβ–ˆβ–Ž      | 516/1563 [05:44<09:46,  1.79it/s]
 33%|β–ˆβ–ˆβ–ˆβ–Ž      | 517/1563 [05:45<10:17,  1.70it/s]
 33%|β–ˆβ–ˆβ–ˆβ–Ž      | 518/1563 [05:45<10:06,  1.72it/s]
 33%|β–ˆβ–ˆβ–ˆβ–Ž      | 519/1563 [05:46<10:19,  1.69it/s]
 33%|β–ˆβ–ˆβ–ˆβ–Ž      | 520/1563 [05:47<10:16,  1.69it/s]
 33%|β–ˆβ–ˆβ–ˆβ–Ž      | 521/1563 [05:47<11:05,  1.56it/s]
 33%|β–ˆβ–ˆβ–ˆβ–Ž      | 522/1563 [05:48<11:32,  1.50it/s]
 33%|β–ˆβ–ˆβ–ˆβ–Ž      | 523/1563 [05:49<12:23,  1.40it/s]
 34%|β–ˆβ–ˆβ–ˆβ–Ž      | 524/1563 [05:50<12:38,  1.37it/s]
 34%|β–ˆβ–ˆβ–ˆβ–Ž      | 525/1563 [05:50<11:23,  1.52it/s]
 34%|β–ˆβ–ˆβ–ˆβ–Ž      | 526/1563 [05:51<10:31,  1.64it/s]
 34%|β–ˆβ–ˆβ–ˆβ–Ž      | 527/1563 [05:51<09:47,  1.76it/s]
 34%|β–ˆβ–ˆβ–ˆβ–      | 528/1563 [05:52<09:32,  1.81it/s]
 34%|β–ˆβ–ˆβ–ˆβ–      | 529/1563 [05:53<10:46,  1.60it/s]
 34%|β–ˆβ–ˆβ–ˆβ–      | 530/1563 [05:53<11:03,  1.56it/s]
 34%|β–ˆβ–ˆβ–ˆβ–      | 531/1563 [05:54<11:41,  1.47it/s]
 34%|β–ˆβ–ˆβ–ˆβ–      | 532/1563 [05:55<12:23,  1.39it/s]
 34%|β–ˆβ–ˆβ–ˆβ–      | 533/1563 [05:55<11:39,  1.47it/s]
 34%|β–ˆβ–ˆβ–ˆβ–      | 534/1563 [05:56<10:35,  1.62it/s]
 34%|β–ˆβ–ˆβ–ˆβ–      | 535/1563 [05:57<11:36,  1.48it/s]
 34%|β–ˆβ–ˆβ–ˆβ–      | 536/1563 [05:57<10:32,  1.62it/s]
 34%|β–ˆβ–ˆβ–ˆβ–      | 537/1563 [05:58<11:41,  1.46it/s]
 34%|β–ˆβ–ˆβ–ˆβ–      | 538/1563 [05:58<10:38,  1.61it/s]
 34%|β–ˆβ–ˆβ–ˆβ–      | 539/1563 [05:59<10:42,  1.59it/s]
 35%|β–ˆβ–ˆβ–ˆβ–      | 540/1563 [06:00<11:06,  1.54it/s]
 35%|β–ˆβ–ˆβ–ˆβ–      | 541/1563 [06:00<10:50,  1.57it/s]
 35%|β–ˆβ–ˆβ–ˆβ–      | 542/1563 [06:01<10:15,  1.66it/s]
 35%|β–ˆβ–ˆβ–ˆβ–      | 543/1563 [06:02<11:19,  1.50it/s]
 35%|β–ˆβ–ˆβ–ˆβ–      | 544/1563 [06:03<11:52,  1.43it/s]
 35%|β–ˆβ–ˆβ–ˆβ–      | 545/1563 [06:03<12:37,  1.34it/s]
 35%|β–ˆβ–ˆβ–ˆβ–      | 546/1563 [06:04<13:11,  1.28it/s]
 35%|β–ˆβ–ˆβ–ˆβ–      | 547/1563 [06:05<12:41,  1.33it/s]
 35%|β–ˆβ–ˆβ–ˆβ–Œ      | 548/1563 [06:05<11:05,  1.52it/s]
 35%|β–ˆβ–ˆβ–ˆβ–Œ      | 549/1563 [06:06<11:03,  1.53it/s]
 35%|β–ˆβ–ˆβ–ˆβ–Œ      | 550/1563 [06:06<09:48,  1.72it/s]
                                                  
{'loss': 0.1867, 'grad_norm': 13.75, 'learning_rate': 1.2975047984644915e-05, 'epoch': 0.35}

 35%|β–ˆβ–ˆβ–ˆβ–Œ      | 550/1563 [06:06<09:48,  1.72it/s]
 35%|β–ˆβ–ˆβ–ˆβ–Œ      | 551/1563 [06:07<11:12,  1.50it/s]
 35%|β–ˆβ–ˆβ–ˆβ–Œ      | 552/1563 [06:08<10:50,  1.55it/s]
 35%|β–ˆβ–ˆβ–ˆβ–Œ      | 553/1563 [06:09<10:59,  1.53it/s]
 35%|β–ˆβ–ˆβ–ˆβ–Œ      | 554/1563 [06:09<11:31,  1.46it/s]
 36%|β–ˆβ–ˆβ–ˆβ–Œ      | 555/1563 [06:10<11:40,  1.44it/s]
 36%|β–ˆβ–ˆβ–ˆβ–Œ      | 556/1563 [06:10<10:33,  1.59it/s]
 36%|β–ˆβ–ˆβ–ˆβ–Œ      | 557/1563 [06:11<10:28,  1.60it/s]
 36%|β–ˆβ–ˆβ–ˆβ–Œ      | 558/1563 [06:12<11:32,  1.45it/s]
 36%|β–ˆβ–ˆβ–ˆβ–Œ      | 559/1563 [06:13<11:37,  1.44it/s]
 36%|β–ˆβ–ˆβ–ˆβ–Œ      | 560/1563 [06:14<12:20,  1.35it/s]
 36%|β–ˆβ–ˆβ–ˆβ–Œ      | 561/1563 [06:14<12:42,  1.31it/s]
 36%|β–ˆβ–ˆβ–ˆβ–Œ      | 562/1563 [06:15<11:38,  1.43it/s]
 36%|β–ˆβ–ˆβ–ˆβ–Œ      | 563/1563 [06:16<11:40,  1.43it/s]
 36%|β–ˆβ–ˆβ–ˆβ–Œ      | 564/1563 [06:16<12:18,  1.35it/s]
 36%|β–ˆβ–ˆβ–ˆβ–Œ      | 565/1563 [06:17<11:10,  1.49it/s]
 36%|β–ˆβ–ˆβ–ˆβ–Œ      | 566/1563 [06:18<12:09,  1.37it/s]
 36%|β–ˆβ–ˆβ–ˆβ–‹      | 567/1563 [06:19<12:52,  1.29it/s]
 36%|β–ˆβ–ˆβ–ˆβ–‹      | 568/1563 [06:19<11:18,  1.47it/s]
 36%|β–ˆβ–ˆβ–ˆβ–‹      | 569/1563 [06:20<11:38,  1.42it/s]
 36%|β–ˆβ–ˆβ–ˆβ–‹      | 570/1563 [06:20<10:33,  1.57it/s]
 37%|β–ˆβ–ˆβ–ˆβ–‹      | 571/1563 [06:21<11:05,  1.49it/s]
 37%|β–ˆβ–ˆβ–ˆβ–‹      | 572/1563 [06:22<11:43,  1.41it/s]
 37%|β–ˆβ–ˆβ–ˆβ–‹      | 573/1563 [06:22<10:48,  1.53it/s]
 37%|β–ˆβ–ˆβ–ˆβ–‹      | 574/1563 [06:23<11:46,  1.40it/s]
 37%|β–ˆβ–ˆβ–ˆβ–‹      | 575/1563 [06:24<11:43,  1.40it/s]
 37%|β–ˆβ–ˆβ–ˆβ–‹      | 576/1563 [06:25<11:26,  1.44it/s]
 37%|β–ˆβ–ˆβ–ˆβ–‹      | 577/1563 [06:25<10:45,  1.53it/s]
 37%|β–ˆβ–ˆβ–ˆβ–‹      | 578/1563 [06:26<10:57,  1.50it/s]
 37%|β–ˆβ–ˆβ–ˆβ–‹      | 579/1563 [06:27<11:52,  1.38it/s]
 37%|β–ˆβ–ˆβ–ˆβ–‹      | 580/1563 [06:28<12:27,  1.31it/s]
 37%|β–ˆβ–ˆβ–ˆβ–‹      | 581/1563 [06:28<11:47,  1.39it/s]
 37%|β–ˆβ–ˆβ–ˆβ–‹      | 582/1563 [06:29<12:19,  1.33it/s]
 37%|β–ˆβ–ˆβ–ˆβ–‹      | 583/1563 [06:30<11:35,  1.41it/s]
 37%|β–ˆβ–ˆβ–ˆβ–‹      | 584/1563 [06:30<10:25,  1.56it/s]
 37%|β–ˆβ–ˆβ–ˆβ–‹      | 585/1563 [06:31<11:24,  1.43it/s]
 37%|β–ˆβ–ˆβ–ˆβ–‹      | 586/1563 [06:32<11:40,  1.39it/s]
 38%|β–ˆβ–ˆβ–ˆβ–Š      | 587/1563 [06:32<10:20,  1.57it/s]
 38%|β–ˆβ–ˆβ–ˆβ–Š      | 588/1563 [06:33<10:40,  1.52it/s]
 38%|β–ˆβ–ˆβ–ˆβ–Š      | 589/1563 [06:34<10:56,  1.48it/s]
 38%|β–ˆβ–ˆβ–ˆβ–Š      | 590/1563 [06:34<10:28,  1.55it/s]
 38%|β–ˆβ–ˆβ–ˆβ–Š      | 591/1563 [06:35<09:35,  1.69it/s]
 38%|β–ˆβ–ˆβ–ˆβ–Š      | 592/1563 [06:35<10:33,  1.53it/s]
 38%|β–ˆβ–ˆβ–ˆβ–Š      | 593/1563 [06:36<11:26,  1.41it/s]
 38%|β–ˆβ–ˆβ–ˆβ–Š      | 594/1563 [06:37<11:16,  1.43it/s]
 38%|β–ˆβ–ˆβ–ˆβ–Š      | 595/1563 [06:37<10:10,  1.59it/s]
 38%|β–ˆβ–ˆβ–ˆβ–Š      | 596/1563 [06:38<11:18,  1.43it/s]
 38%|β–ˆβ–ˆβ–ˆβ–Š      | 597/1563 [06:39<11:19,  1.42it/s]
 38%|β–ˆβ–ˆβ–ˆβ–Š      | 598/1563 [06:40<11:15,  1.43it/s]
 38%|β–ˆβ–ˆβ–ˆβ–Š      | 599/1563 [06:40<10:47,  1.49it/s]
 38%|β–ˆβ–ˆβ–ˆβ–Š      | 600/1563 [06:41<10:46,  1.49it/s]
                                                  
{'loss': 0.1441, 'grad_norm': 12.9375, 'learning_rate': 1.233525271912988e-05, 'epoch': 0.38}

 38%|β–ˆβ–ˆβ–ˆβ–Š      | 600/1563 [06:41<10:46,  1.49it/s]
 38%|β–ˆβ–ˆβ–ˆβ–Š      | 601/1563 [06:42<10:20,  1.55it/s]
 39%|β–ˆβ–ˆβ–ˆβ–Š      | 602/1563 [06:42<10:39,  1.50it/s]
 39%|β–ˆβ–ˆβ–ˆβ–Š      | 603/1563 [06:43<11:30,  1.39it/s]
 39%|β–ˆβ–ˆβ–ˆβ–Š      | 604/1563 [06:44<10:25,  1.53it/s]
 39%|β–ˆβ–ˆβ–ˆβ–Š      | 605/1563 [06:44<09:32,  1.67it/s]
 39%|β–ˆβ–ˆβ–ˆβ–‰      | 606/1563 [06:45<09:12,  1.73it/s]
 39%|β–ˆβ–ˆβ–ˆβ–‰      | 607/1563 [06:45<10:15,  1.55it/s]
 39%|β–ˆβ–ˆβ–ˆβ–‰      | 608/1563 [06:46<09:38,  1.65it/s]
 39%|β–ˆβ–ˆβ–ˆβ–‰      | 609/1563 [06:46<08:56,  1.78it/s]
 39%|β–ˆβ–ˆβ–ˆβ–‰      | 610/1563 [06:47<08:35,  1.85it/s]
 39%|β–ˆβ–ˆβ–ˆβ–‰      | 611/1563 [06:47<08:37,  1.84it/s]
 39%|β–ˆβ–ˆβ–ˆβ–‰      | 612/1563 [06:48<08:20,  1.90it/s]
 39%|β–ˆβ–ˆβ–ˆβ–‰      | 613/1563 [06:49<08:40,  1.83it/s]
 39%|β–ˆβ–ˆβ–ˆβ–‰      | 614/1563 [06:49<09:41,  1.63it/s]
 39%|β–ˆβ–ˆβ–ˆβ–‰      | 615/1563 [06:50<09:58,  1.58it/s]
 39%|β–ˆβ–ˆβ–ˆβ–‰      | 616/1563 [06:51<10:42,  1.47it/s]
 39%|β–ˆβ–ˆβ–ˆβ–‰      | 617/1563 [06:51<09:47,  1.61it/s]
 40%|β–ˆβ–ˆβ–ˆβ–‰      | 618/1563 [06:52<10:25,  1.51it/s]
 40%|β–ˆβ–ˆβ–ˆβ–‰      | 619/1563 [06:53<10:27,  1.51it/s]
 40%|β–ˆβ–ˆβ–ˆβ–‰      | 620/1563 [06:53<09:28,  1.66it/s]
 40%|β–ˆβ–ˆβ–ˆβ–‰      | 621/1563 [06:54<09:50,  1.59it/s]
 40%|β–ˆβ–ˆβ–ˆβ–‰      | 622/1563 [06:55<11:01,  1.42it/s]
 40%|β–ˆβ–ˆβ–ˆβ–‰      | 623/1563 [06:55<09:59,  1.57it/s]
 40%|β–ˆβ–ˆβ–ˆβ–‰      | 624/1563 [06:56<09:58,  1.57it/s]
 40%|β–ˆβ–ˆβ–ˆβ–‰      | 625/1563 [06:57<10:15,  1.53it/s]
 40%|β–ˆβ–ˆβ–ˆβ–ˆ      | 626/1563 [06:57<10:34,  1.48it/s]
 40%|β–ˆβ–ˆβ–ˆβ–ˆ      | 627/1563 [06:58<10:25,  1.50it/s]
 40%|β–ˆβ–ˆβ–ˆβ–ˆ      | 628/1563 [06:59<11:19,  1.38it/s]
 40%|β–ˆβ–ˆβ–ˆβ–ˆ      | 629/1563 [06:59<10:25,  1.49it/s]
 40%|β–ˆβ–ˆβ–ˆβ–ˆ      | 630/1563 [07:00<11:17,  1.38it/s]
 40%|β–ˆβ–ˆβ–ˆβ–ˆ      | 631/1563 [07:01<10:28,  1.48it/s]
 40%|β–ˆβ–ˆβ–ˆβ–ˆ      | 632/1563 [07:01<09:21,  1.66it/s]
 40%|β–ˆβ–ˆβ–ˆβ–ˆ      | 633/1563 [07:02<10:06,  1.53it/s]
 41%|β–ˆβ–ˆβ–ˆβ–ˆ      | 634/1563 [07:02<09:23,  1.65it/s]
 41%|β–ˆβ–ˆβ–ˆβ–ˆ      | 635/1563 [07:03<09:06,  1.70it/s]
 41%|β–ˆβ–ˆβ–ˆβ–ˆ      | 636/1563 [07:03<08:36,  1.79it/s]
 41%|β–ˆβ–ˆβ–ˆβ–ˆ      | 637/1563 [07:04<10:00,  1.54it/s]
 41%|β–ˆβ–ˆβ–ˆβ–ˆ      | 638/1563 [07:05<10:37,  1.45it/s]
 41%|β–ˆβ–ˆβ–ˆβ–ˆ      | 639/1563 [07:06<09:51,  1.56it/s]
 41%|β–ˆβ–ˆβ–ˆβ–ˆ      | 640/1563 [07:06<09:47,  1.57it/s]
 41%|β–ˆβ–ˆβ–ˆβ–ˆ      | 641/1563 [07:07<09:04,  1.69it/s]
 41%|β–ˆβ–ˆβ–ˆβ–ˆ      | 642/1563 [07:07<09:35,  1.60it/s]
 41%|β–ˆβ–ˆβ–ˆβ–ˆ      | 643/1563 [07:08<09:14,  1.66it/s]
 41%|β–ˆβ–ˆβ–ˆβ–ˆ      | 644/1563 [07:09<09:42,  1.58it/s]
 41%|β–ˆβ–ˆβ–ˆβ–ˆβ–     | 645/1563 [07:10<10:44,  1.42it/s]
 41%|β–ˆβ–ˆβ–ˆβ–ˆβ–     | 646/1563 [07:10<09:49,  1.56it/s]
 41%|β–ˆβ–ˆβ–ˆβ–ˆβ–     | 647/1563 [07:11<09:37,  1.59it/s]
 41%|β–ˆβ–ˆβ–ˆβ–ˆβ–     | 648/1563 [07:12<10:40,  1.43it/s]
 42%|β–ˆβ–ˆβ–ˆβ–ˆβ–     | 649/1563 [07:12<11:21,  1.34it/s]
 42%|β–ˆβ–ˆβ–ˆβ–ˆβ–     | 650/1563 [07:13<10:52,  1.40it/s]
                                                  
{'loss': 0.1872, 'grad_norm': 23.0, 'learning_rate': 1.1695457453614845e-05, 'epoch': 0.42}

 42%|β–ˆβ–ˆβ–ˆβ–ˆβ–     | 650/1563 [07:13<10:52,  1.40it/s]
 42%|β–ˆβ–ˆβ–ˆβ–ˆβ–     | 651/1563 [07:14<09:55,  1.53it/s]
 42%|β–ˆβ–ˆβ–ˆβ–ˆβ–     | 652/1563 [07:14<08:53,  1.71it/s]
 42%|β–ˆβ–ˆβ–ˆβ–ˆβ–     | 653/1563 [07:15<10:06,  1.50it/s]
 42%|β–ˆβ–ˆβ–ˆβ–ˆβ–     | 654/1563 [07:15<09:14,  1.64it/s]
 42%|β–ˆβ–ˆβ–ˆβ–ˆβ–     | 655/1563 [07:16<08:38,  1.75it/s]
 42%|β–ˆβ–ˆβ–ˆβ–ˆβ–     | 656/1563 [07:17<09:55,  1.52it/s]
 42%|β–ˆβ–ˆβ–ˆβ–ˆβ–     | 657/1563 [07:17<10:19,  1.46it/s]
 42%|β–ˆβ–ˆβ–ˆβ–ˆβ–     | 658/1563 [07:18<11:02,  1.37it/s]
 42%|β–ˆβ–ˆβ–ˆβ–ˆβ–     | 659/1563 [07:19<11:41,  1.29it/s]
 42%|β–ˆβ–ˆβ–ˆβ–ˆβ–     | 660/1563 [07:20<10:26,  1.44it/s]
 42%|β–ˆβ–ˆβ–ˆβ–ˆβ–     | 661/1563 [07:20<10:40,  1.41it/s]
 42%|β–ˆβ–ˆβ–ˆβ–ˆβ–     | 662/1563 [07:21<09:19,  1.61it/s]
 42%|β–ˆβ–ˆβ–ˆβ–ˆβ–     | 663/1563 [07:21<09:34,  1.57it/s]
 42%|β–ˆβ–ˆβ–ˆβ–ˆβ–     | 664/1563 [07:22<09:48,  1.53it/s]
 43%|β–ˆβ–ˆβ–ˆβ–ˆβ–Ž     | 665/1563 [07:23<09:10,  1.63it/s]
 43%|β–ˆβ–ˆβ–ˆβ–ˆβ–Ž     | 666/1563 [07:23<08:53,  1.68it/s]
 43%|β–ˆβ–ˆβ–ˆβ–ˆβ–Ž     | 667/1563 [07:24<08:22,  1.78it/s]
 43%|β–ˆβ–ˆβ–ˆβ–ˆβ–Ž     | 668/1563 [07:24<09:24,  1.59it/s]
 43%|β–ˆβ–ˆβ–ˆβ–ˆβ–Ž     | 669/1563 [07:25<08:44,  1.70it/s]
 43%|β–ˆβ–ˆβ–ˆβ–ˆβ–Ž     | 670/1563 [07:25<08:02,  1.85it/s]
 43%|β–ˆβ–ˆβ–ˆβ–ˆβ–Ž     | 671/1563 [07:26<07:39,  1.94it/s]
 43%|β–ˆβ–ˆβ–ˆβ–ˆβ–Ž     | 672/1563 [07:26<07:15,  2.04it/s]
 43%|β–ˆβ–ˆβ–ˆβ–ˆβ–Ž     | 673/1563 [07:27<07:24,  2.00it/s]
 43%|β–ˆβ–ˆβ–ˆβ–ˆβ–Ž     | 674/1563 [07:28<08:56,  1.66it/s]
 43%|β–ˆβ–ˆβ–ˆβ–ˆβ–Ž     | 675/1563 [07:28<09:45,  1.52it/s]
 43%|β–ˆβ–ˆβ–ˆβ–ˆβ–Ž     | 676/1563 [07:29<09:01,  1.64it/s]
 43%|β–ˆβ–ˆβ–ˆβ–ˆβ–Ž     | 677/1563 [07:30<09:13,  1.60it/s]
 43%|β–ˆβ–ˆβ–ˆβ–ˆβ–Ž     | 678/1563 [07:30<08:19,  1.77it/s]
 43%|β–ˆβ–ˆβ–ˆβ–ˆβ–Ž     | 679/1563 [07:31<08:45,  1.68it/s]
 44%|β–ˆβ–ˆβ–ˆβ–ˆβ–Ž     | 680/1563 [07:31<09:09,  1.61it/s]
 44%|β–ˆβ–ˆβ–ˆβ–ˆβ–Ž     | 681/1563 [07:32<08:09,  1.80it/s]
 44%|β–ˆβ–ˆβ–ˆβ–ˆβ–Ž     | 682/1563 [07:32<07:43,  1.90it/s]
 44%|β–ˆβ–ˆβ–ˆβ–ˆβ–Ž     | 683/1563 [07:33<08:21,  1.76it/s]
 44%|β–ˆβ–ˆβ–ˆβ–ˆβ–     | 684/1563 [07:34<08:38,  1.69it/s]
 44%|β–ˆβ–ˆβ–ˆβ–ˆβ–     | 685/1563 [07:34<09:01,  1.62it/s]
 44%|β–ˆβ–ˆβ–ˆβ–ˆβ–     | 686/1563 [07:35<08:16,  1.77it/s]
 44%|β–ˆβ–ˆβ–ˆβ–ˆβ–     | 687/1563 [07:35<08:04,  1.81it/s]
 44%|β–ˆβ–ˆβ–ˆβ–ˆβ–     | 688/1563 [07:36<08:43,  1.67it/s]
 44%|β–ˆβ–ˆβ–ˆβ–ˆβ–     | 689/1563 [07:37<09:25,  1.55it/s]
 44%|β–ˆβ–ˆβ–ˆβ–ˆβ–     | 690/1563 [07:38<10:21,  1.41it/s]
 44%|β–ˆβ–ˆβ–ˆβ–ˆβ–     | 691/1563 [07:38<10:58,  1.32it/s]
 44%|β–ˆβ–ˆβ–ˆβ–ˆβ–     | 692/1563 [07:39<09:57,  1.46it/s]
 44%|β–ˆβ–ˆβ–ˆβ–ˆβ–     | 693/1563 [07:39<09:03,  1.60it/s]
 44%|β–ˆβ–ˆβ–ˆβ–ˆβ–     | 694/1563 [07:40<08:21,  1.73it/s]
 44%|β–ˆβ–ˆβ–ˆβ–ˆβ–     | 695/1563 [07:40<08:14,  1.76it/s]
 45%|β–ˆβ–ˆβ–ˆβ–ˆβ–     | 696/1563 [07:41<07:46,  1.86it/s]
 45%|β–ˆβ–ˆβ–ˆβ–ˆβ–     | 697/1563 [07:41<07:29,  1.93it/s]
 45%|β–ˆβ–ˆβ–ˆβ–ˆβ–     | 698/1563 [07:42<07:40,  1.88it/s]
 45%|β–ˆβ–ˆβ–ˆβ–ˆβ–     | 699/1563 [07:42<07:29,  1.92it/s]
 45%|β–ˆβ–ˆβ–ˆβ–ˆβ–     | 700/1563 [07:43<07:07,  2.02it/s]
                                                  
{'loss': 0.1618, 'grad_norm': 16.625, 'learning_rate': 1.105566218809981e-05, 'epoch': 0.45}

 45%|β–ˆβ–ˆβ–ˆβ–ˆβ–     | 700/1563 [07:43<07:07,  2.02it/s]
 45%|β–ˆβ–ˆβ–ˆβ–ˆβ–     | 701/1563 [07:44<08:45,  1.64it/s]
 45%|β–ˆβ–ˆβ–ˆβ–ˆβ–     | 702/1563 [07:44<08:32,  1.68it/s]
 45%|β–ˆβ–ˆβ–ˆβ–ˆβ–     | 703/1563 [07:45<07:54,  1.81it/s]
 45%|β–ˆβ–ˆβ–ˆβ–ˆβ–Œ     | 704/1563 [07:45<08:53,  1.61it/s]
 45%|β–ˆβ–ˆβ–ˆβ–ˆβ–Œ     | 705/1563 [07:46<08:14,  1.73it/s]
 45%|β–ˆβ–ˆβ–ˆβ–ˆβ–Œ     | 706/1563 [07:46<07:43,  1.85it/s]
 45%|β–ˆβ–ˆβ–ˆβ–ˆβ–Œ     | 707/1563 [07:47<08:44,  1.63it/s]
 45%|β–ˆβ–ˆβ–ˆβ–ˆβ–Œ     | 708/1563 [07:48<08:04,  1.76it/s]
 45%|β–ˆβ–ˆβ–ˆβ–ˆβ–Œ     | 709/1563 [07:48<07:37,  1.87it/s]
 45%|β–ˆβ–ˆβ–ˆβ–ˆβ–Œ     | 710/1563 [07:49<08:14,  1.72it/s]
 45%|β–ˆβ–ˆβ–ˆβ–ˆβ–Œ     | 711/1563 [07:50<08:49,  1.61it/s]
 46%|β–ˆβ–ˆβ–ˆβ–ˆβ–Œ     | 712/1563 [07:50<08:43,  1.63it/s]
 46%|β–ˆβ–ˆβ–ˆβ–ˆβ–Œ     | 713/1563 [07:51<09:29,  1.49it/s]
 46%|β–ˆβ–ˆβ–ˆβ–ˆβ–Œ     | 714/1563 [07:51<08:29,  1.67it/s]
 46%|β–ˆβ–ˆβ–ˆβ–ˆβ–Œ     | 715/1563 [07:52<08:45,  1.61it/s]
 46%|β–ˆβ–ˆβ–ˆβ–ˆβ–Œ     | 716/1563 [07:52<08:01,  1.76it/s]
 46%|β–ˆβ–ˆβ–ˆβ–ˆβ–Œ     | 717/1563 [07:53<09:02,  1.56it/s]
 46%|β–ˆβ–ˆβ–ˆβ–ˆβ–Œ     | 718/1563 [07:54<08:42,  1.62it/s]
 46%|β–ˆβ–ˆβ–ˆβ–ˆβ–Œ     | 719/1563 [07:54<08:00,  1.76it/s]
 46%|β–ˆβ–ˆβ–ˆβ–ˆβ–Œ     | 720/1563 [07:55<07:51,  1.79it/s]
 46%|β–ˆβ–ˆβ–ˆβ–ˆβ–Œ     | 721/1563 [07:56<08:49,  1.59it/s]
 46%|β–ˆβ–ˆβ–ˆβ–ˆβ–Œ     | 722/1563 [07:56<09:32,  1.47it/s]
 46%|β–ˆβ–ˆβ–ˆβ–ˆβ–‹     | 723/1563 [07:57<10:14,  1.37it/s]
 46%|β–ˆβ–ˆβ–ˆβ–ˆβ–‹     | 724/1563 [07:58<09:01,  1.55it/s]
 46%|β–ˆβ–ˆβ–ˆβ–ˆβ–‹     | 725/1563 [07:59<09:51,  1.42it/s]
 46%|β–ˆβ–ˆβ–ˆβ–ˆβ–‹     | 726/1563 [07:59<09:37,  1.45it/s]
 47%|β–ˆβ–ˆβ–ˆβ–ˆβ–‹     | 727/1563 [08:00<09:37,  1.45it/s]
 47%|β–ˆβ–ˆβ–ˆβ–ˆβ–‹     | 728/1563 [08:00<08:38,  1.61it/s]
 47%|β–ˆβ–ˆβ–ˆβ–ˆβ–‹     | 729/1563 [08:01<08:54,  1.56it/s]
 47%|β–ˆβ–ˆβ–ˆβ–ˆβ–‹     | 730/1563 [08:02<09:17,  1.49it/s]
 47%|β–ˆβ–ˆβ–ˆβ–ˆβ–‹     | 731/1563 [08:02<08:39,  1.60it/s]
 47%|β–ˆβ–ˆβ–ˆβ–ˆβ–‹     | 732/1563 [08:03<09:29,  1.46it/s]
 47%|β–ˆβ–ˆβ–ˆβ–ˆβ–‹     | 733/1563 [08:04<08:56,  1.55it/s]
 47%|β–ˆβ–ˆβ–ˆβ–ˆβ–‹     | 734/1563 [08:04<08:14,  1.68it/s]
 47%|β–ˆβ–ˆβ–ˆβ–ˆβ–‹     | 735/1563 [08:05<09:16,  1.49it/s]
 47%|β–ˆβ–ˆβ–ˆβ–ˆβ–‹     | 736/1563 [08:06<08:31,  1.62it/s]
 47%|β–ˆβ–ˆβ–ˆβ–ˆβ–‹     | 737/1563 [08:06<09:10,  1.50it/s]
 47%|β–ˆβ–ˆβ–ˆβ–ˆβ–‹     | 738/1563 [08:07<08:32,  1.61it/s]
 47%|β–ˆβ–ˆβ–ˆβ–ˆβ–‹     | 739/1563 [08:07<08:03,  1.70it/s]
 47%|β–ˆβ–ˆβ–ˆβ–ˆβ–‹     | 740/1563 [08:08<08:58,  1.53it/s]
 47%|β–ˆβ–ˆβ–ˆβ–ˆβ–‹     | 741/1563 [08:09<09:29,  1.44it/s]
 47%|β–ˆβ–ˆβ–ˆβ–ˆβ–‹     | 742/1563 [08:10<09:52,  1.38it/s]
 48%|β–ˆβ–ˆβ–ˆβ–ˆβ–Š     | 743/1563 [08:11<10:29,  1.30it/s]
 48%|β–ˆβ–ˆβ–ˆβ–ˆβ–Š     | 744/1563 [08:11<09:26,  1.45it/s]
 48%|β–ˆβ–ˆβ–ˆβ–ˆβ–Š     | 745/1563 [08:12<10:10,  1.34it/s]
 48%|β–ˆβ–ˆβ–ˆβ–ˆβ–Š     | 746/1563 [08:12<09:01,  1.51it/s]
 48%|β–ˆβ–ˆβ–ˆβ–ˆβ–Š     | 747/1563 [08:13<09:05,  1.50it/s]
 48%|β–ˆβ–ˆβ–ˆβ–ˆβ–Š     | 748/1563 [08:14<09:55,  1.37it/s]
 48%|β–ˆβ–ˆβ–ˆβ–ˆβ–Š     | 749/1563 [08:14<08:51,  1.53it/s]
 48%|β–ˆβ–ˆβ–ˆβ–ˆβ–Š     | 750/1563 [08:15<08:16,  1.64it/s]
                                                  
{'loss': 0.1435, 'grad_norm': 16.0, 'learning_rate': 1.0415866922584774e-05, 'epoch': 0.48}

 48%|β–ˆβ–ˆβ–ˆβ–ˆβ–Š     | 750/1563 [08:15<08:16,  1.64it/s]
 48%|β–ˆβ–ˆβ–ˆβ–ˆβ–Š     | 751/1563 [08:16<08:51,  1.53it/s]
 48%|β–ˆβ–ˆβ–ˆβ–ˆβ–Š     | 752/1563 [08:16<07:55,  1.70it/s]
 48%|β–ˆβ–ˆβ–ˆβ–ˆβ–Š     | 753/1563 [08:17<08:10,  1.65it/s]
 48%|β–ˆβ–ˆβ–ˆβ–ˆβ–Š     | 754/1563 [08:18<09:03,  1.49it/s]
 48%|β–ˆβ–ˆβ–ˆβ–ˆβ–Š     | 755/1563 [08:18<09:04,  1.48it/s]
 48%|β–ˆβ–ˆβ–ˆβ–ˆβ–Š     | 756/1563 [08:19<09:52,  1.36it/s]
 48%|β–ˆβ–ˆβ–ˆβ–ˆβ–Š     | 757/1563 [08:20<08:52,  1.51it/s]
 48%|β–ˆβ–ˆβ–ˆβ–ˆβ–Š     | 758/1563 [08:20<09:10,  1.46it/s]
 49%|β–ˆβ–ˆβ–ˆβ–ˆβ–Š     | 759/1563 [08:21<09:37,  1.39it/s]
 49%|β–ˆβ–ˆβ–ˆβ–ˆβ–Š     | 760/1563 [08:22<08:37,  1.55it/s]
 49%|β–ˆβ–ˆβ–ˆβ–ˆβ–Š     | 761/1563 [08:23<09:24,  1.42it/s]
 49%|β–ˆβ–ˆβ–ˆβ–ˆβ–‰     | 762/1563 [08:23<08:16,  1.61it/s]
 49%|β–ˆβ–ˆβ–ˆβ–ˆβ–‰     | 763/1563 [08:23<07:41,  1.73it/s]
 49%|β–ˆβ–ˆβ–ˆβ–ˆβ–‰     | 764/1563 [08:24<08:29,  1.57it/s]
 49%|β–ˆβ–ˆβ–ˆβ–ˆβ–‰     | 765/1563 [08:25<09:21,  1.42it/s]
 49%|β–ˆβ–ˆβ–ˆβ–ˆβ–‰     | 766/1563 [08:26<09:58,  1.33it/s]
 49%|β–ˆβ–ˆβ–ˆβ–ˆβ–‰     | 767/1563 [08:27<10:19,  1.28it/s]
 49%|β–ˆβ–ˆβ–ˆβ–ˆβ–‰     | 768/1563 [08:27<09:46,  1.36it/s]
 49%|β–ˆβ–ˆβ–ˆβ–ˆβ–‰     | 769/1563 [08:28<08:46,  1.51it/s]
 49%|β–ˆβ–ˆβ–ˆβ–ˆβ–‰     | 770/1563 [08:28<07:57,  1.66it/s]
 49%|β–ˆβ–ˆβ–ˆβ–ˆβ–‰     | 771/1563 [08:29<08:52,  1.49it/s]
 49%|β–ˆβ–ˆβ–ˆβ–ˆβ–‰     | 772/1563 [08:30<09:05,  1.45it/s]
 49%|β–ˆβ–ˆβ–ˆβ–ˆβ–‰     | 773/1563 [08:31<09:27,  1.39it/s]
 50%|β–ˆβ–ˆβ–ˆβ–ˆβ–‰     | 774/1563 [08:31<09:18,  1.41it/s]
 50%|β–ˆβ–ˆβ–ˆβ–ˆβ–‰     | 775/1563 [08:32<08:19,  1.58it/s]
 50%|β–ˆβ–ˆβ–ˆβ–ˆβ–‰     | 776/1563 [08:33<08:47,  1.49it/s]
 50%|β–ˆβ–ˆβ–ˆβ–ˆβ–‰     | 777/1563 [08:33<08:36,  1.52it/s]
 50%|β–ˆβ–ˆβ–ˆβ–ˆβ–‰     | 778/1563 [08:34<08:55,  1.47it/s]
 50%|β–ˆβ–ˆβ–ˆβ–ˆβ–‰     | 779/1563 [08:35<09:34,  1.36it/s]
 50%|β–ˆβ–ˆβ–ˆβ–ˆβ–‰     | 780/1563 [08:35<08:46,  1.49it/s]
 50%|β–ˆβ–ˆβ–ˆβ–ˆβ–‰     | 781/1563 [08:36<08:51,  1.47it/s]
 50%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆ     | 782/1563 [08:37<07:59,  1.63it/s]
 50%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆ     | 783/1563 [08:37<07:39,  1.70it/s]
 50%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆ     | 784/1563 [08:38<07:41,  1.69it/s]
 50%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆ     | 785/1563 [08:38<08:01,  1.62it/s]
 50%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆ     | 786/1563 [08:39<07:24,  1.75it/s]
 50%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆ     | 787/1563 [08:40<08:15,  1.57it/s]
 50%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆ     | 788/1563 [08:40<07:32,  1.71it/s]
 50%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆ     | 789/1563 [08:41<07:52,  1.64it/s]
 51%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆ     | 790/1563 [08:41<07:15,  1.77it/s]
 51%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆ     | 791/1563 [08:42<07:17,  1.77it/s]
 51%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆ     | 792/1563 [08:42<07:30,  1.71it/s]
 51%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆ     | 793/1563 [08:43<06:54,  1.86it/s]
 51%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆ     | 794/1563 [08:44<07:46,  1.65it/s]
 51%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆ     | 795/1563 [08:44<07:07,  1.80it/s]
 51%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆ     | 796/1563 [08:45<08:13,  1.55it/s]
 51%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆ     | 797/1563 [08:46<09:02,  1.41it/s]
 51%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆ     | 798/1563 [08:46<08:12,  1.55it/s]
 51%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆ     | 799/1563 [08:47<09:00,  1.41it/s]
 51%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆ     | 800/1563 [08:48<09:27,  1.34it/s]
                                                  
{'loss': 0.1442, 'grad_norm': 1.359375, 'learning_rate': 9.776071657069739e-06, 'epoch': 0.51}

 51%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆ     | 800/1563 [08:48<09:27,  1.34it/s]
 51%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆ     | 801/1563 [08:49<09:30,  1.34it/s]
 51%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–    | 802/1563 [08:49<09:12,  1.38it/s]
 51%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–    | 803/1563 [08:50<08:01,  1.58it/s]
 51%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–    | 804/1563 [08:51<08:58,  1.41it/s]
 52%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–    | 805/1563 [08:51<09:24,  1.34it/s]
 52%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–    | 806/1563 [08:52<09:22,  1.35it/s]
 52%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–    | 807/1563 [08:53<08:33,  1.47it/s]
 52%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–    | 808/1563 [08:53<07:47,  1.61it/s]
 52%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–    | 809/1563 [08:54<08:37,  1.46it/s]
 52%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–    | 810/1563 [08:55<09:07,  1.38it/s]
 52%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–    | 811/1563 [08:55<07:59,  1.57it/s]
 52%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–    | 812/1563 [08:56<08:13,  1.52it/s]
 52%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–    | 813/1563 [08:57<08:36,  1.45it/s]
 52%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–    | 814/1563 [08:57<08:31,  1.46it/s]
 52%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–    | 815/1563 [08:58<09:05,  1.37it/s]
 52%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–    | 816/1563 [08:59<09:21,  1.33it/s]
 52%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–    | 817/1563 [09:00<09:04,  1.37it/s]
 52%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–    | 818/1563 [09:00<08:58,  1.38it/s]
 52%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–    | 819/1563 [09:01<07:51,  1.58it/s]
 52%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–    | 820/1563 [09:01<07:42,  1.61it/s]
 53%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Ž    | 821/1563 [09:02<07:15,  1.70it/s]
 53%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Ž    | 822/1563 [09:02<06:44,  1.83it/s]
 53%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Ž    | 823/1563 [09:03<07:52,  1.57it/s]
 53%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Ž    | 824/1563 [09:04<08:42,  1.41it/s]
 53%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Ž    | 825/1563 [09:05<08:46,  1.40it/s]
 53%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Ž    | 826/1563 [09:05<08:17,  1.48it/s]
 53%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Ž    | 827/1563 [09:06<07:30,  1.64it/s]
 53%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Ž    | 828/1563 [09:07<08:13,  1.49it/s]
 53%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Ž    | 829/1563 [09:08<08:40,  1.41it/s]
 53%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Ž    | 830/1563 [09:08<08:19,  1.47it/s]
 53%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Ž    | 831/1563 [09:09<08:19,  1.46it/s]
 53%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Ž    | 832/1563 [09:09<07:37,  1.60it/s]
 53%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Ž    | 833/1563 [09:10<07:50,  1.55it/s]
 53%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Ž    | 834/1563 [09:10<07:03,  1.72it/s]
 53%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Ž    | 835/1563 [09:11<07:14,  1.68it/s]
 53%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Ž    | 836/1563 [09:12<08:08,  1.49it/s]
 54%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Ž    | 837/1563 [09:13<08:08,  1.49it/s]
 54%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Ž    | 838/1563 [09:13<08:21,  1.44it/s]
 54%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Ž    | 839/1563 [09:14<08:51,  1.36it/s]
 54%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Ž    | 840/1563 [09:15<08:45,  1.37it/s]
 54%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–    | 841/1563 [09:16<09:04,  1.33it/s]
 54%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–    | 842/1563 [09:17<09:27,  1.27it/s]
 54%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–    | 843/1563 [09:17<09:41,  1.24it/s]
 54%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–    | 844/1563 [09:18<09:18,  1.29it/s]
 54%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–    | 845/1563 [09:19<08:53,  1.35it/s]
 54%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–    | 846/1563 [09:19<08:22,  1.43it/s]
 54%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–    | 847/1563 [09:20<07:29,  1.59it/s]
 54%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–    | 848/1563 [09:21<08:12,  1.45it/s]
 54%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–    | 849/1563 [09:22<08:42,  1.37it/s]
 54%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–    | 850/1563 [09:22<07:55,  1.50it/s]
                                                  
{'loss': 0.1416, 'grad_norm': 1.0859375, 'learning_rate': 9.136276391554704e-06, 'epoch': 0.54}

 54%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–    | 850/1563 [09:22<07:55,  1.50it/s]
 54%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–    | 851/1563 [09:23<08:05,  1.47it/s]
 55%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–    | 852/1563 [09:23<07:20,  1.62it/s]
 55%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–    | 853/1563 [09:24<07:00,  1.69it/s]
 55%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–    | 854/1563 [09:24<06:31,  1.81it/s]
 55%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–    | 855/1563 [09:25<07:20,  1.61it/s]
 55%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–    | 856/1563 [09:26<07:03,  1.67it/s]
 55%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–    | 857/1563 [09:26<06:33,  1.79it/s]
 55%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–    | 858/1563 [09:27<07:33,  1.55it/s]
 55%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–    | 859/1563 [09:28<07:41,  1.53it/s]
 55%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Œ    | 860/1563 [09:28<07:04,  1.66it/s]
 55%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Œ    | 861/1563 [09:29<07:14,  1.62it/s]
 55%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Œ    | 862/1563 [09:29<06:28,  1.81it/s]
 55%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Œ    | 863/1563 [09:30<06:04,  1.92it/s]
 55%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Œ    | 864/1563 [09:30<06:33,  1.78it/s]
 55%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Œ    | 865/1563 [09:31<07:24,  1.57it/s]
 55%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Œ    | 866/1563 [09:31<06:43,  1.73it/s]
 55%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Œ    | 867/1563 [09:32<07:26,  1.56it/s]
 56%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Œ    | 868/1563 [09:33<07:24,  1.56it/s]
 56%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Œ    | 869/1563 [09:33<06:48,  1.70it/s]
 56%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Œ    | 870/1563 [09:34<06:22,  1.81it/s]
 56%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Œ    | 871/1563 [09:34<06:05,  1.89it/s]
 56%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Œ    | 872/1563 [09:35<07:09,  1.61it/s]
 56%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Œ    | 873/1563 [09:36<06:40,  1.72it/s]
 56%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Œ    | 874/1563 [09:36<06:54,  1.66it/s]
 56%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Œ    | 875/1563 [09:37<07:20,  1.56it/s]
 56%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Œ    | 876/1563 [09:38<07:35,  1.51it/s]
 56%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Œ    | 877/1563 [09:38<07:23,  1.55it/s]
 56%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Œ    | 878/1563 [09:39<06:46,  1.68it/s]
 56%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Œ    | 879/1563 [09:39<07:08,  1.60it/s]
 56%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‹    | 880/1563 [09:40<07:45,  1.47it/s]
 56%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‹    | 881/1563 [09:41<06:54,  1.65it/s]
 56%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‹    | 882/1563 [09:42<07:47,  1.46it/s]
 56%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‹    | 883/1563 [09:42<07:02,  1.61it/s]
 57%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‹    | 884/1563 [09:43<07:49,  1.45it/s]
 57%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‹    | 885/1563 [09:44<07:44,  1.46it/s]
 57%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‹    | 886/1563 [09:44<07:30,  1.50it/s]
 57%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‹    | 887/1563 [09:45<06:42,  1.68it/s]
 57%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‹    | 888/1563 [09:45<07:03,  1.59it/s]
 57%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‹    | 889/1563 [09:46<07:25,  1.51it/s]
 57%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‹    | 890/1563 [09:47<07:57,  1.41it/s]
 57%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‹    | 891/1563 [09:47<07:22,  1.52it/s]
 57%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‹    | 892/1563 [09:48<06:45,  1.66it/s]
 57%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‹    | 893/1563 [09:48<06:24,  1.74it/s]
 57%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‹    | 894/1563 [09:49<06:08,  1.82it/s]
 57%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‹    | 895/1563 [09:50<07:10,  1.55it/s]
 57%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‹    | 896/1563 [09:50<06:50,  1.63it/s]
 57%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‹    | 897/1563 [09:51<06:38,  1.67it/s]
 57%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‹    | 898/1563 [09:51<06:20,  1.75it/s]
 58%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Š    | 899/1563 [09:52<06:25,  1.72it/s]
 58%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Š    | 900/1563 [09:52<06:03,  1.83it/s]
                                                  
{'loss': 0.1478, 'grad_norm': 5.09375, 'learning_rate': 8.496481126039668e-06, 'epoch': 0.58}

 58%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Š    | 900/1563 [09:53<06:03,  1.83it/s]
 58%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Š    | 901/1563 [09:53<05:52,  1.88it/s]
 58%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Š    | 902/1563 [09:54<06:17,  1.75it/s]
 58%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Š    | 903/1563 [09:54<06:01,  1.83it/s]
 58%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Š    | 904/1563 [09:55<07:03,  1.56it/s]
 58%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Š    | 905/1563 [09:55<06:29,  1.69it/s]
 58%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Š    | 906/1563 [09:56<06:09,  1.78it/s]
 58%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Š    | 907/1563 [09:57<06:46,  1.61it/s]
 58%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Š    | 908/1563 [09:57<06:19,  1.73it/s]
 58%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Š    | 909/1563 [09:58<06:40,  1.63it/s]
 58%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Š    | 910/1563 [09:58<06:14,  1.75it/s]
 58%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Š    | 911/1563 [09:59<06:05,  1.78it/s]
 58%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Š    | 912/1563 [10:00<06:28,  1.68it/s]
 58%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Š    | 913/1563 [10:00<06:12,  1.74it/s]
 58%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Š    | 914/1563 [10:01<06:31,  1.66it/s]
 59%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Š    | 915/1563 [10:01<06:19,  1.71it/s]
 59%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Š    | 916/1563 [10:02<06:11,  1.74it/s]
 59%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Š    | 917/1563 [10:02<05:45,  1.87it/s]
 59%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Š    | 918/1563 [10:03<05:36,  1.92it/s]
 59%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‰    | 919/1563 [10:04<06:17,  1.70it/s]
 59%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‰    | 920/1563 [10:04<06:56,  1.55it/s]
 59%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‰    | 921/1563 [10:05<07:02,  1.52it/s]
 59%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‰    | 922/1563 [10:06<07:04,  1.51it/s]
 59%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‰    | 923/1563 [10:06<07:23,  1.44it/s]
 59%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‰    | 924/1563 [10:07<07:12,  1.48it/s]
 59%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‰    | 925/1563 [10:08<06:29,  1.64it/s]
 59%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‰    | 926/1563 [10:08<07:05,  1.50it/s]
 59%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‰    | 927/1563 [10:09<07:20,  1.44it/s]
 59%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‰    | 928/1563 [10:10<06:42,  1.58it/s]
 59%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‰    | 929/1563 [10:10<06:10,  1.71it/s]
 60%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‰    | 930/1563 [10:10<05:44,  1.84it/s]
 60%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‰    | 931/1563 [10:11<05:30,  1.91it/s]
 60%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‰    | 932/1563 [10:11<05:13,  2.01it/s]
 60%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‰    | 933/1563 [10:12<05:47,  1.81it/s]
 60%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‰    | 934/1563 [10:13<06:41,  1.56it/s]
 60%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‰    | 935/1563 [10:13<06:17,  1.66it/s]
 60%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‰    | 936/1563 [10:14<07:04,  1.48it/s]
 60%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‰    | 937/1563 [10:15<06:19,  1.65it/s]
 60%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆ    | 938/1563 [10:16<06:57,  1.50it/s]
 60%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆ    | 939/1563 [10:16<07:07,  1.46it/s]
 60%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆ    | 940/1563 [10:17<06:38,  1.56it/s]
 60%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆ    | 941/1563 [10:17<05:58,  1.73it/s]
 60%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆ    | 942/1563 [10:18<06:34,  1.57it/s]
 60%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆ    | 943/1563 [10:19<06:54,  1.50it/s]
 60%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆ    | 944/1563 [10:19<06:49,  1.51it/s]
 60%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆ    | 945/1563 [10:20<06:18,  1.63it/s]
 61%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆ    | 946/1563 [10:20<05:44,  1.79it/s]
 61%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆ    | 947/1563 [10:21<06:04,  1.69it/s]
 61%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆ    | 948/1563 [10:21<05:31,  1.86it/s]
 61%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆ    | 949/1563 [10:22<05:19,  1.92it/s]
 61%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆ    | 950/1563 [10:22<05:15,  1.94it/s]
                                                  
{'loss': 0.1462, 'grad_norm': 27.75, 'learning_rate': 7.856685860524633e-06, 'epoch': 0.61}

 61%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆ    | 950/1563 [10:22<05:15,  1.94it/s]
 61%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆ    | 951/1563 [10:23<06:16,  1.63it/s]
 61%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆ    | 952/1563 [10:24<05:56,  1.71it/s]
 61%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆ    | 953/1563 [10:24<05:36,  1.81it/s]
 61%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆ    | 954/1563 [10:25<05:29,  1.85it/s]
 61%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆ    | 955/1563 [10:26<06:08,  1.65it/s]
 61%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆ    | 956/1563 [10:26<06:20,  1.59it/s]
 61%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆ    | 957/1563 [10:27<06:33,  1.54it/s]
 61%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–   | 958/1563 [10:27<06:14,  1.62it/s]
 61%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–   | 959/1563 [10:28<06:08,  1.64it/s]
 61%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–   | 960/1563 [10:29<06:36,  1.52it/s]
 61%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–   | 961/1563 [10:29<06:41,  1.50it/s]
 62%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–   | 962/1563 [10:30<06:36,  1.52it/s]
 62%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–   | 963/1563 [10:31<06:15,  1.60it/s]
 62%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–   | 964/1563 [10:31<06:36,  1.51it/s]
 62%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–   | 965/1563 [10:32<06:22,  1.56it/s]
 62%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–   | 966/1563 [10:33<06:15,  1.59it/s]
 62%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–   | 967/1563 [10:33<06:32,  1.52it/s]
 62%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–   | 968/1563 [10:34<05:59,  1.65it/s]
 62%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–   | 969/1563 [10:35<06:41,  1.48it/s]
 62%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–   | 970/1563 [10:35<06:04,  1.63it/s]
 62%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–   | 971/1563 [10:36<06:34,  1.50it/s]
 62%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–   | 972/1563 [10:36<06:05,  1.62it/s]
 62%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–   | 973/1563 [10:37<06:50,  1.44it/s]
 62%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–   | 974/1563 [10:38<06:47,  1.45it/s]
 62%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–   | 975/1563 [10:39<07:15,  1.35it/s]
 62%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–   | 976/1563 [10:40<07:29,  1.30it/s]
 63%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Ž   | 977/1563 [10:40<07:16,  1.34it/s]
 63%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Ž   | 978/1563 [10:41<07:03,  1.38it/s]
 63%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Ž   | 979/1563 [10:42<07:03,  1.38it/s]
 63%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Ž   | 980/1563 [10:43<07:27,  1.30it/s]
 63%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Ž   | 981/1563 [10:43<07:39,  1.27it/s]
 63%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Ž   | 982/1563 [10:44<07:41,  1.26it/s]
 63%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Ž   | 983/1563 [10:45<07:51,  1.23it/s]
 63%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Ž   | 984/1563 [10:46<07:00,  1.38it/s]
 63%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Ž   | 985/1563 [10:46<06:14,  1.54it/s]
 63%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Ž   | 986/1563 [10:47<06:52,  1.40it/s]
 63%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Ž   | 987/1563 [10:48<06:53,  1.39it/s]
 63%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Ž   | 988/1563 [10:48<06:53,  1.39it/s]
 63%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Ž   | 989/1563 [10:49<06:43,  1.42it/s]
 63%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Ž   | 990/1563 [10:50<06:22,  1.50it/s]
 63%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Ž   | 991/1563 [10:50<06:46,  1.41it/s]
 63%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Ž   | 992/1563 [10:51<06:14,  1.53it/s]
 64%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Ž   | 993/1563 [10:52<06:04,  1.56it/s]
 64%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Ž   | 994/1563 [10:52<06:06,  1.55it/s]
 64%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Ž   | 995/1563 [10:53<05:51,  1.62it/s]
 64%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Ž   | 996/1563 [10:53<05:33,  1.70it/s]
 64%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–   | 997/1563 [10:54<05:15,  1.80it/s]
 64%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–   | 998/1563 [10:55<06:05,  1.54it/s]
 64%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–   | 999/1563 [10:55<05:53,  1.60it/s]
 64%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–   | 1000/1563 [10:56<06:25,  1.46it/s]
                                                   
{'loss': 0.1499, 'grad_norm': 0.921875, 'learning_rate': 7.216890595009598e-06, 'epoch': 0.64}

 64%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–   | 1000/1563 [10:56<06:25,  1.46it/s]
 64%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–   | 1001/1563 [10:57<06:18,  1.48it/s]
 64%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–   | 1002/1563 [10:57<06:20,  1.47it/s]
 64%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–   | 1003/1563 [10:58<06:31,  1.43it/s]
 64%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–   | 1004/1563 [10:59<06:33,  1.42it/s]
 64%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–   | 1005/1563 [10:59<06:01,  1.54it/s]
 64%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–   | 1006/1563 [11:00<06:42,  1.39it/s]
 64%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–   | 1007/1563 [11:01<07:06,  1.31it/s]
 64%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–   | 1008/1563 [11:02<07:16,  1.27it/s]
 65%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–   | 1009/1563 [11:03<06:51,  1.34it/s]
 65%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–   | 1010/1563 [11:04<07:09,  1.29it/s]
 65%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–   | 1011/1563 [11:04<07:22,  1.25it/s]
 65%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–   | 1012/1563 [11:05<06:26,  1.43it/s]
 65%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–   | 1013/1563 [11:06<06:29,  1.41it/s]
 65%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–   | 1014/1563 [11:06<05:42,  1.60it/s]
 65%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–   | 1015/1563 [11:06<05:21,  1.70it/s]
 65%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Œ   | 1016/1563 [11:07<05:07,  1.78it/s]
 65%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Œ   | 1017/1563 [11:07<04:50,  1.88it/s]
 65%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Œ   | 1018/1563 [11:08<05:20,  1.70it/s]
 65%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Œ   | 1019/1563 [11:09<05:31,  1.64it/s]
 65%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Œ   | 1020/1563 [11:09<05:06,  1.77it/s]
 65%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Œ   | 1021/1563 [11:10<05:30,  1.64it/s]
 65%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Œ   | 1022/1563 [11:11<05:14,  1.72it/s]
 65%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Œ   | 1023/1563 [11:11<04:50,  1.86it/s]
 66%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Œ   | 1024/1563 [11:11<04:45,  1.89it/s]
 66%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Œ   | 1025/1563 [11:12<05:09,  1.74it/s]
 66%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Œ   | 1026/1563 [11:13<04:52,  1.83it/s]
 66%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Œ   | 1027/1563 [11:13<05:37,  1.59it/s]
 66%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Œ   | 1028/1563 [11:14<06:13,  1.43it/s]
 66%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Œ   | 1029/1563 [11:15<06:31,  1.36it/s]
 66%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Œ   | 1030/1563 [11:16<06:24,  1.38it/s]
 66%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Œ   | 1031/1563 [11:17<06:19,  1.40it/s]
 66%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Œ   | 1032/1563 [11:17<05:35,  1.58it/s]
 66%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Œ   | 1033/1563 [11:18<05:39,  1.56it/s]
 66%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Œ   | 1034/1563 [11:18<06:15,  1.41it/s]
 66%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Œ   | 1035/1563 [11:19<06:14,  1.41it/s]
 66%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‹   | 1036/1563 [11:20<05:33,  1.58it/s]
 66%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‹   | 1037/1563 [11:20<05:44,  1.53it/s]
 66%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‹   | 1038/1563 [11:21<06:09,  1.42it/s]
 66%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‹   | 1039/1563 [11:22<05:44,  1.52it/s]
 67%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‹   | 1040/1563 [11:23<06:15,  1.39it/s]
 67%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‹   | 1041/1563 [11:23<05:40,  1.53it/s]
 67%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‹   | 1042/1563 [11:24<06:04,  1.43it/s]
 67%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‹   | 1043/1563 [11:24<05:33,  1.56it/s]
 67%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‹   | 1044/1563 [11:25<06:07,  1.41it/s]
 67%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‹   | 1045/1563 [11:26<06:12,  1.39it/s]
 67%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‹   | 1046/1563 [11:26<05:27,  1.58it/s]
 67%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‹   | 1047/1563 [11:27<05:57,  1.44it/s]
 67%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‹   | 1048/1563 [11:28<05:17,  1.62it/s]
 67%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‹   | 1049/1563 [11:28<04:51,  1.77it/s]
 67%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‹   | 1050/1563 [11:29<04:28,  1.91it/s]
                                                   
{'loss': 0.1438, 'grad_norm': 13.25, 'learning_rate': 6.577095329494563e-06, 'epoch': 0.67}

 67%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‹   | 1050/1563 [11:29<04:28,  1.91it/s]
 67%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‹   | 1051/1563 [11:29<04:18,  1.98it/s]
 67%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‹   | 1052/1563 [11:30<04:14,  2.01it/s]
 67%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‹   | 1053/1563 [11:30<05:19,  1.60it/s]
 67%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‹   | 1054/1563 [11:31<05:51,  1.45it/s]
 67%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‹   | 1055/1563 [11:32<06:14,  1.36it/s]
 68%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Š   | 1056/1563 [11:33<06:18,  1.34it/s]
 68%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Š   | 1057/1563 [11:34<06:13,  1.35it/s]
 68%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Š   | 1058/1563 [11:34<06:28,  1.30it/s]
 68%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Š   | 1059/1563 [11:35<06:41,  1.26it/s]
 68%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Š   | 1060/1563 [11:36<06:28,  1.29it/s]
 68%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Š   | 1061/1563 [11:37<06:31,  1.28it/s]
 68%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Š   | 1062/1563 [11:38<06:34,  1.27it/s]
 68%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Š   | 1063/1563 [11:38<06:37,  1.26it/s]
 68%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Š   | 1064/1563 [11:39<05:58,  1.39it/s]
 68%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Š   | 1065/1563 [11:40<06:18,  1.31it/s]
 68%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Š   | 1066/1563 [11:40<05:30,  1.50it/s]
 68%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Š   | 1067/1563 [11:41<05:45,  1.44it/s]
 68%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Š   | 1068/1563 [11:42<05:45,  1.43it/s]
 68%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Š   | 1069/1563 [11:43<06:04,  1.36it/s]
 68%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Š   | 1070/1563 [11:43<06:20,  1.29it/s]
 69%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Š   | 1071/1563 [11:44<05:32,  1.48it/s]
 69%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Š   | 1072/1563 [11:45<05:59,  1.36it/s]
 69%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Š   | 1073/1563 [11:46<06:18,  1.30it/s]
 69%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Š   | 1074/1563 [11:46<05:28,  1.49it/s]
 69%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‰   | 1075/1563 [11:47<05:52,  1.38it/s]
 69%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‰   | 1076/1563 [11:48<06:06,  1.33it/s]
 69%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‰   | 1077/1563 [11:48<05:45,  1.41it/s]
 69%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‰   | 1078/1563 [11:49<05:37,  1.44it/s]
 69%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‰   | 1079/1563 [11:50<05:55,  1.36it/s]
 69%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‰   | 1080/1563 [11:50<05:31,  1.46it/s]
 69%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‰   | 1081/1563 [11:51<05:58,  1.34it/s]
 69%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‰   | 1082/1563 [11:52<06:16,  1.28it/s]
 69%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‰   | 1083/1563 [11:53<05:57,  1.34it/s]
 69%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‰   | 1084/1563 [11:54<06:07,  1.30it/s]
 69%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‰   | 1085/1563 [11:54<05:44,  1.39it/s]
 69%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‰   | 1086/1563 [11:55<05:18,  1.50it/s]
 70%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‰   | 1087/1563 [11:55<05:18,  1.50it/s]
 70%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‰   | 1088/1563 [11:56<04:56,  1.60it/s]
 70%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‰   | 1089/1563 [11:56<04:36,  1.72it/s]
 70%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‰   | 1090/1563 [11:57<05:12,  1.51it/s]
 70%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‰   | 1091/1563 [11:58<05:40,  1.39it/s]
 70%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‰   | 1092/1563 [11:59<05:11,  1.51it/s]
 70%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‰   | 1093/1563 [11:59<05:08,  1.52it/s]
 70%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‰   | 1094/1563 [12:00<05:35,  1.40it/s]
 70%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆ   | 1095/1563 [12:01<04:58,  1.57it/s]
 70%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆ   | 1096/1563 [12:01<04:30,  1.73it/s]
 70%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆ   | 1097/1563 [12:02<04:39,  1.67it/s]
 70%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆ   | 1098/1563 [12:03<05:12,  1.49it/s]
 70%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆ   | 1099/1563 [12:03<05:29,  1.41it/s]
 70%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆ   | 1100/1563 [12:04<05:33,  1.39it/s]
                                                   
{'loss': 0.1419, 'grad_norm': 9.1875, 'learning_rate': 5.937300063979527e-06, 'epoch': 0.7}

 70%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆ   | 1100/1563 [12:04<05:33,  1.39it/s]
 70%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆ   | 1101/1563 [12:05<05:09,  1.49it/s]
 71%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆ   | 1102/1563 [12:06<05:32,  1.39it/s]
 71%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆ   | 1103/1563 [12:06<05:05,  1.51it/s]
 71%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆ   | 1104/1563 [12:07<04:39,  1.64it/s]
 71%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆ   | 1105/1563 [12:07<04:25,  1.72it/s]
 71%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆ   | 1106/1563 [12:08<05:03,  1.51it/s]
 71%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆ   | 1107/1563 [12:09<05:04,  1.50it/s]
 71%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆ   | 1108/1563 [12:09<05:17,  1.43it/s]
 71%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆ   | 1109/1563 [12:10<05:40,  1.33it/s]
 71%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆ   | 1110/1563 [12:11<04:53,  1.54it/s]
 71%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆ   | 1111/1563 [12:11<05:16,  1.43it/s]
 71%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆ   | 1112/1563 [12:12<04:45,  1.58it/s]
 71%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆ   | 1113/1563 [12:13<05:12,  1.44it/s]
 71%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–  | 1114/1563 [12:14<05:22,  1.39it/s]
 71%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–  | 1115/1563 [12:14<05:36,  1.33it/s]
 71%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–  | 1116/1563 [12:15<05:01,  1.48it/s]
 71%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–  | 1117/1563 [12:16<05:16,  1.41it/s]
 72%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–  | 1118/1563 [12:16<04:58,  1.49it/s]
 72%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–  | 1119/1563 [12:17<05:19,  1.39it/s]
 72%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–  | 1120/1563 [12:18<05:27,  1.35it/s]
 72%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–  | 1121/1563 [12:19<05:20,  1.38it/s]
 72%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–  | 1122/1563 [12:19<04:50,  1.52it/s]
 72%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–  | 1123/1563 [12:20<05:14,  1.40it/s]
 72%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–  | 1124/1563 [12:21<05:25,  1.35it/s]
 72%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–  | 1125/1563 [12:21<05:27,  1.34it/s]
 72%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–  | 1126/1563 [12:22<04:53,  1.49it/s]
 72%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–  | 1127/1563 [12:23<04:40,  1.56it/s]
 72%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–  | 1128/1563 [12:23<04:53,  1.48it/s]
 72%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–  | 1129/1563 [12:24<04:26,  1.63it/s]
 72%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–  | 1130/1563 [12:25<04:54,  1.47it/s]
 72%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–  | 1131/1563 [12:25<05:15,  1.37it/s]
 72%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–  | 1132/1563 [12:26<04:35,  1.57it/s]
 72%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–  | 1133/1563 [12:27<04:38,  1.55it/s]
 73%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Ž  | 1134/1563 [12:27<04:20,  1.65it/s]
 73%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Ž  | 1135/1563 [12:28<04:41,  1.52it/s]
 73%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Ž  | 1136/1563 [12:29<04:55,  1.45it/s]
 73%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Ž  | 1137/1563 [12:29<05:16,  1.35it/s]
 73%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Ž  | 1138/1563 [12:30<04:39,  1.52it/s]
 73%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Ž  | 1139/1563 [12:31<04:37,  1.53it/s]
 73%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Ž  | 1140/1563 [12:31<04:26,  1.59it/s]
 73%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Ž  | 1141/1563 [12:32<04:41,  1.50it/s]
 73%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Ž  | 1142/1563 [12:33<05:01,  1.40it/s]
 73%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Ž  | 1143/1563 [12:33<05:10,  1.35it/s]
 73%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Ž  | 1144/1563 [12:34<05:23,  1.30it/s]
 73%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Ž  | 1145/1563 [12:35<05:30,  1.26it/s]
 73%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Ž  | 1146/1563 [12:36<05:33,  1.25it/s]
 73%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Ž  | 1147/1563 [12:37<05:38,  1.23it/s]
 73%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Ž  | 1148/1563 [12:38<05:31,  1.25it/s]
 74%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Ž  | 1149/1563 [12:38<04:52,  1.41it/s]
 74%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Ž  | 1150/1563 [12:39<05:05,  1.35it/s]
                                                   
{'loss': 0.1443, 'grad_norm': 3.234375, 'learning_rate': 5.297504798464492e-06, 'epoch': 0.74}

 74%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Ž  | 1150/1563 [12:39<05:05,  1.35it/s]
 74%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Ž  | 1151/1563 [12:40<05:08,  1.34it/s]
 74%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Ž  | 1152/1563 [12:41<05:18,  1.29it/s]
 74%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–  | 1153/1563 [12:41<05:09,  1.32it/s]
 74%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–  | 1154/1563 [12:42<04:29,  1.52it/s]
 74%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–  | 1155/1563 [12:42<04:07,  1.65it/s]
 74%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–  | 1156/1563 [12:43<04:06,  1.65it/s]
 74%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–  | 1157/1563 [12:43<03:56,  1.72it/s]
 74%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–  | 1158/1563 [12:44<03:40,  1.84it/s]
 74%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–  | 1159/1563 [12:44<03:53,  1.73it/s]
 74%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–  | 1160/1563 [12:45<04:07,  1.63it/s]
 74%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–  | 1161/1563 [12:46<04:17,  1.56it/s]
 74%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–  | 1162/1563 [12:47<04:32,  1.47it/s]
 74%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–  | 1163/1563 [12:47<04:15,  1.56it/s]
 74%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–  | 1164/1563 [12:48<04:09,  1.60it/s]
 75%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–  | 1165/1563 [12:48<03:56,  1.68it/s]
 75%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–  | 1166/1563 [12:49<04:27,  1.49it/s]
 75%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–  | 1167/1563 [12:50<04:48,  1.37it/s]
 75%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–  | 1168/1563 [12:50<04:15,  1.54it/s]
 75%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–  | 1169/1563 [12:51<04:28,  1.46it/s]
 75%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–  | 1170/1563 [12:52<04:24,  1.49it/s]
 75%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–  | 1171/1563 [12:53<04:37,  1.41it/s]
 75%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–  | 1172/1563 [12:53<04:11,  1.55it/s]
 75%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Œ  | 1173/1563 [12:54<04:31,  1.44it/s]
 75%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Œ  | 1174/1563 [12:55<04:50,  1.34it/s]
 75%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Œ  | 1175/1563 [12:55<04:41,  1.38it/s]
 75%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Œ  | 1176/1563 [12:56<04:54,  1.31it/s]
 75%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Œ  | 1177/1563 [12:57<04:43,  1.36it/s]
 75%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Œ  | 1178/1563 [12:57<04:11,  1.53it/s]
 75%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Œ  | 1179/1563 [12:58<04:27,  1.44it/s]
 75%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Œ  | 1180/1563 [12:59<04:36,  1.38it/s]
 76%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Œ  | 1181/1563 [13:00<04:09,  1.53it/s]
 76%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Œ  | 1182/1563 [13:00<03:47,  1.67it/s]
 76%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Œ  | 1183/1563 [13:01<04:03,  1.56it/s]
 76%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Œ  | 1184/1563 [13:01<03:44,  1.69it/s]
 76%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Œ  | 1185/1563 [13:02<03:31,  1.79it/s]
 76%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Œ  | 1186/1563 [13:02<03:17,  1.91it/s]
 76%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Œ  | 1187/1563 [13:03<03:45,  1.67it/s]
 76%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Œ  | 1188/1563 [13:04<04:13,  1.48it/s]
 76%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Œ  | 1189/1563 [13:05<04:35,  1.36it/s]
 76%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Œ  | 1190/1563 [13:05<04:28,  1.39it/s]
 76%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Œ  | 1191/1563 [13:06<03:59,  1.56it/s]
 76%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‹  | 1192/1563 [13:06<04:03,  1.53it/s]
 76%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‹  | 1193/1563 [13:07<04:25,  1.39it/s]
 76%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‹  | 1194/1563 [13:08<04:10,  1.47it/s]
 76%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‹  | 1195/1563 [13:09<04:30,  1.36it/s]
 77%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‹  | 1196/1563 [13:10<04:43,  1.29it/s]
 77%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‹  | 1197/1563 [13:10<04:10,  1.46it/s]
 77%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‹  | 1198/1563 [13:11<03:44,  1.63it/s]
 77%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‹  | 1199/1563 [13:11<03:30,  1.73it/s]
 77%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‹  | 1200/1563 [13:12<03:26,  1.75it/s]
                                                   
{'loss': 0.1396, 'grad_norm': 1.1015625, 'learning_rate': 4.657709532949457e-06, 'epoch': 0.77}

 77%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‹  | 1200/1563 [13:12<03:26,  1.75it/s]
 77%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‹  | 1201/1563 [13:12<03:54,  1.54it/s]
 77%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‹  | 1202/1563 [13:13<04:04,  1.48it/s]
 77%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‹  | 1203/1563 [13:14<03:56,  1.52it/s]
 77%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‹  | 1204/1563 [13:14<03:30,  1.70it/s]
 77%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‹  | 1205/1563 [13:15<03:59,  1.50it/s]
 77%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‹  | 1206/1563 [13:16<03:32,  1.68it/s]
 77%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‹  | 1207/1563 [13:16<03:41,  1.60it/s]
 77%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‹  | 1208/1563 [13:17<03:47,  1.56it/s]
 77%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‹  | 1209/1563 [13:18<03:51,  1.53it/s]
 77%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‹  | 1210/1563 [13:18<03:36,  1.63it/s]
 77%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‹  | 1211/1563 [13:19<04:00,  1.46it/s]
 78%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Š  | 1212/1563 [13:19<03:37,  1.61it/s]
 78%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Š  | 1213/1563 [13:20<04:02,  1.45it/s]
 78%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Š  | 1214/1563 [13:21<03:40,  1.58it/s]
 78%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Š  | 1215/1563 [13:21<03:50,  1.51it/s]
 78%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Š  | 1216/1563 [13:22<04:08,  1.40it/s]
 78%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Š  | 1217/1563 [13:23<03:40,  1.57it/s]
 78%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Š  | 1218/1563 [13:23<03:45,  1.53it/s]
 78%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Š  | 1219/1563 [13:24<04:03,  1.41it/s]
 78%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Š  | 1220/1563 [13:25<03:46,  1.51it/s]
 78%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Š  | 1221/1563 [13:26<03:51,  1.48it/s]
 78%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Š  | 1222/1563 [13:26<03:31,  1.61it/s]
 78%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Š  | 1223/1563 [13:27<03:51,  1.47it/s]
 78%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Š  | 1224/1563 [13:28<04:00,  1.41it/s]
 78%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Š  | 1225/1563 [13:28<03:48,  1.48it/s]
 78%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Š  | 1226/1563 [13:29<03:28,  1.62it/s]
 79%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Š  | 1227/1563 [13:29<03:38,  1.53it/s]
 79%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Š  | 1228/1563 [13:30<03:14,  1.73it/s]
 79%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Š  | 1229/1563 [13:31<03:28,  1.60it/s]
 79%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Š  | 1230/1563 [13:31<03:25,  1.62it/s]
 79%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‰  | 1231/1563 [13:32<03:48,  1.45it/s]
 79%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‰  | 1232/1563 [13:33<03:54,  1.41it/s]
 79%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‰  | 1233/1563 [13:33<03:35,  1.53it/s]
 79%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‰  | 1234/1563 [13:34<03:17,  1.67it/s]
 79%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‰  | 1235/1563 [13:34<02:56,  1.86it/s]
 79%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‰  | 1236/1563 [13:35<03:18,  1.65it/s]
 79%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‰  | 1237/1563 [13:35<03:01,  1.80it/s]
 79%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‰  | 1238/1563 [13:36<03:26,  1.57it/s]
 79%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‰  | 1239/1563 [13:37<03:32,  1.52it/s]
 79%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‰  | 1240/1563 [13:37<03:14,  1.66it/s]
 79%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‰  | 1241/1563 [13:38<03:37,  1.48it/s]
 79%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‰  | 1242/1563 [13:39<03:12,  1.67it/s]
 80%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‰  | 1243/1563 [13:39<03:19,  1.61it/s]
 80%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‰  | 1244/1563 [13:40<03:25,  1.55it/s]
 80%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‰  | 1245/1563 [13:41<03:17,  1.61it/s]
 80%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‰  | 1246/1563 [13:41<02:59,  1.77it/s]
 80%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‰  | 1247/1563 [13:42<03:18,  1.59it/s]
 80%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‰  | 1248/1563 [13:43<03:31,  1.49it/s]
 80%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‰  | 1249/1563 [13:43<03:37,  1.45it/s]
 80%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‰  | 1250/1563 [13:44<03:39,  1.43it/s]
                                                   
{'loss': 0.1414, 'grad_norm': 0.8828125, 'learning_rate': 4.0179142674344215e-06, 'epoch': 0.8}

 80%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‰  | 1250/1563 [13:44<03:39,  1.43it/s]
 80%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆ  | 1251/1563 [13:45<03:53,  1.34it/s]
 80%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆ  | 1252/1563 [13:46<04:01,  1.29it/s]
 80%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆ  | 1253/1563 [13:46<03:43,  1.38it/s]
 80%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆ  | 1254/1563 [13:47<03:20,  1.54it/s]
 80%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆ  | 1255/1563 [13:47<03:09,  1.62it/s]
 80%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆ  | 1256/1563 [13:48<03:22,  1.51it/s]
 80%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆ  | 1257/1563 [13:49<03:17,  1.55it/s]
 80%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆ  | 1258/1563 [13:49<03:18,  1.53it/s]
 81%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆ  | 1259/1563 [13:50<03:05,  1.64it/s]
 81%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆ  | 1260/1563 [13:51<03:22,  1.49it/s]
 81%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆ  | 1261/1563 [13:51<03:22,  1.49it/s]
 81%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆ  | 1262/1563 [13:52<03:11,  1.58it/s]
 81%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆ  | 1263/1563 [13:52<02:54,  1.72it/s]
 81%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆ  | 1264/1563 [13:53<03:00,  1.65it/s]
 81%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆ  | 1265/1563 [13:54<03:24,  1.46it/s]
 81%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆ  | 1266/1563 [13:55<03:25,  1.44it/s]
 81%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆ  | 1267/1563 [13:55<03:22,  1.46it/s]
 81%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆ  | 1268/1563 [13:56<03:20,  1.47it/s]
 81%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆ  | 1269/1563 [13:57<03:05,  1.58it/s]
 81%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ– | 1270/1563 [13:57<03:22,  1.45it/s]
 81%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ– | 1271/1563 [13:58<03:20,  1.46it/s]
 81%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ– | 1272/1563 [13:59<03:19,  1.46it/s]
 81%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ– | 1273/1563 [13:59<03:15,  1.48it/s]
 82%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ– | 1274/1563 [14:00<03:31,  1.37it/s]
 82%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ– | 1275/1563 [14:01<03:25,  1.40it/s]
 82%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ– | 1276/1563 [14:02<03:38,  1.31it/s]
 82%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ– | 1277/1563 [14:03<03:45,  1.27it/s]
 82%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ– | 1278/1563 [14:03<03:16,  1.45it/s]
 82%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ– | 1279/1563 [14:04<02:56,  1.61it/s]
 82%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ– | 1280/1563 [14:04<03:15,  1.45it/s]
 82%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ– | 1281/1563 [14:05<02:53,  1.63it/s]
 82%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ– | 1282/1563 [14:06<03:11,  1.46it/s]
 82%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ– | 1283/1563 [14:06<02:57,  1.58it/s]
 82%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ– | 1284/1563 [14:07<03:11,  1.46it/s]
 82%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ– | 1285/1563 [14:07<02:55,  1.58it/s]
 82%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ– | 1286/1563 [14:08<02:39,  1.74it/s]
 82%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ– | 1287/1563 [14:09<02:54,  1.58it/s]
 82%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ– | 1288/1563 [14:09<02:52,  1.59it/s]
 82%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ– | 1289/1563 [14:10<02:36,  1.75it/s]
 83%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Ž | 1290/1563 [14:10<02:27,  1.85it/s]
 83%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Ž | 1291/1563 [14:11<02:44,  1.65it/s]
 83%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Ž | 1292/1563 [14:12<02:51,  1.58it/s]
 83%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Ž | 1293/1563 [14:12<02:33,  1.76it/s]
 83%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Ž | 1294/1563 [14:13<02:54,  1.54it/s]
 83%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Ž | 1295/1563 [14:14<03:10,  1.41it/s]
 83%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Ž | 1296/1563 [14:15<03:22,  1.32it/s]
 83%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Ž | 1297/1563 [14:15<03:04,  1.44it/s]
 83%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Ž | 1298/1563 [14:16<03:06,  1.42it/s]
 83%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Ž | 1299/1563 [14:16<02:48,  1.56it/s]
 83%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Ž | 1300/1563 [14:17<02:33,  1.71it/s]
                                                   
{'loss': 0.1409, 'grad_norm': 8.8125, 'learning_rate': 3.378119001919386e-06, 'epoch': 0.83}

 83%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Ž | 1300/1563 [14:17<02:33,  1.71it/s]
 83%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Ž | 1301/1563 [14:18<02:44,  1.60it/s]
 83%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Ž | 1302/1563 [14:18<02:26,  1.78it/s]
 83%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Ž | 1303/1563 [14:18<02:18,  1.88it/s]
 83%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Ž | 1304/1563 [14:19<02:38,  1.64it/s]
 83%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Ž | 1305/1563 [14:20<02:34,  1.67it/s]
 84%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Ž | 1306/1563 [14:20<02:23,  1.80it/s]
 84%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Ž | 1307/1563 [14:21<02:40,  1.60it/s]
 84%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Ž | 1308/1563 [14:22<02:54,  1.46it/s]
 84%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Ž | 1309/1563 [14:22<02:37,  1.61it/s]
 84%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ– | 1310/1563 [14:23<02:43,  1.55it/s]
 84%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ– | 1311/1563 [14:24<02:33,  1.64it/s]
 84%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ– | 1312/1563 [14:24<02:24,  1.73it/s]
 84%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ– | 1313/1563 [14:25<02:31,  1.65it/s]
 84%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ– | 1314/1563 [14:25<02:20,  1.77it/s]
 84%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ– | 1315/1563 [14:26<02:19,  1.78it/s]
 84%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ– | 1316/1563 [14:27<02:38,  1.55it/s]
 84%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ– | 1317/1563 [14:27<02:30,  1.64it/s]
 84%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ– | 1318/1563 [14:28<02:19,  1.76it/s]
 84%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ– | 1319/1563 [14:28<02:09,  1.88it/s]
 84%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ– | 1320/1563 [14:29<02:28,  1.63it/s]
 85%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ– | 1321/1563 [14:29<02:17,  1.76it/s]
 85%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ– | 1322/1563 [14:30<02:22,  1.70it/s]
 85%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ– | 1323/1563 [14:31<02:18,  1.73it/s]
 85%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ– | 1324/1563 [14:31<02:22,  1.68it/s]
 85%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ– | 1325/1563 [14:32<02:21,  1.68it/s]
 85%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ– | 1326/1563 [14:33<02:39,  1.49it/s]
 85%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ– | 1327/1563 [14:33<02:44,  1.44it/s]
 85%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ– | 1328/1563 [14:34<02:25,  1.62it/s]
 85%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Œ | 1329/1563 [14:34<02:24,  1.62it/s]
 85%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Œ | 1330/1563 [14:35<02:16,  1.70it/s]
 85%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Œ | 1331/1563 [14:36<02:33,  1.51it/s]
 85%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Œ | 1332/1563 [14:36<02:35,  1.48it/s]
 85%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Œ | 1333/1563 [14:37<02:17,  1.67it/s]
 85%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Œ | 1334/1563 [14:37<02:14,  1.71it/s]
 85%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Œ | 1335/1563 [14:38<02:02,  1.85it/s]
 85%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Œ | 1336/1563 [14:39<02:22,  1.59it/s]
 86%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Œ | 1337/1563 [14:40<02:34,  1.46it/s]
 86%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Œ | 1338/1563 [14:40<02:45,  1.36it/s]
 86%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Œ | 1339/1563 [14:41<02:49,  1.32it/s]
 86%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Œ | 1340/1563 [14:42<02:55,  1.27it/s]
 86%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Œ | 1341/1563 [14:43<02:47,  1.32it/s]
 86%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Œ | 1342/1563 [14:43<02:44,  1.34it/s]
 86%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Œ | 1343/1563 [14:44<02:39,  1.38it/s]
 86%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Œ | 1344/1563 [14:45<02:46,  1.32it/s]
 86%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Œ | 1345/1563 [14:46<02:51,  1.27it/s]
 86%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Œ | 1346/1563 [14:46<02:36,  1.39it/s]
 86%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Œ | 1347/1563 [14:47<02:44,  1.31it/s]
 86%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Œ | 1348/1563 [14:48<02:29,  1.44it/s]
 86%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‹ | 1349/1563 [14:48<02:19,  1.53it/s]
 86%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‹ | 1350/1563 [14:49<02:11,  1.62it/s]
                                                   
{'loss': 0.1395, 'grad_norm': 12.1875, 'learning_rate': 2.738323736404351e-06, 'epoch': 0.86}

 86%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‹ | 1350/1563 [14:49<02:11,  1.62it/s]
 86%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‹ | 1351/1563 [14:49<02:09,  1.63it/s]
 87%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‹ | 1352/1563 [14:50<02:25,  1.45it/s]
 87%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‹ | 1353/1563 [14:51<02:32,  1.37it/s]
 87%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‹ | 1354/1563 [14:52<02:19,  1.50it/s]
 87%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‹ | 1355/1563 [14:52<02:24,  1.44it/s]
 87%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‹ | 1356/1563 [14:53<02:34,  1.34it/s]
 87%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‹ | 1357/1563 [14:54<02:13,  1.54it/s]
 87%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‹ | 1358/1563 [14:54<02:18,  1.49it/s]
 87%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‹ | 1359/1563 [14:55<02:28,  1.37it/s]
 87%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‹ | 1360/1563 [14:56<02:25,  1.40it/s]
 87%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‹ | 1361/1563 [14:57<02:27,  1.37it/s]
 87%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‹ | 1362/1563 [14:57<02:25,  1.38it/s]
 87%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‹ | 1363/1563 [14:58<02:31,  1.32it/s]
 87%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‹ | 1364/1563 [14:59<02:32,  1.30it/s]
 87%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‹ | 1365/1563 [15:00<02:19,  1.42it/s]
 87%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‹ | 1366/1563 [15:01<02:26,  1.34it/s]
 87%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‹ | 1367/1563 [15:01<02:14,  1.46it/s]
 88%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Š | 1368/1563 [15:02<02:23,  1.36it/s]
 88%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Š | 1369/1563 [15:03<02:30,  1.29it/s]
 88%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Š | 1370/1563 [15:03<02:20,  1.37it/s]
 88%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Š | 1371/1563 [15:04<02:04,  1.54it/s]
 88%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Š | 1372/1563 [15:05<02:06,  1.51it/s]
 88%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Š | 1373/1563 [15:05<02:14,  1.42it/s]
 88%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Š | 1374/1563 [15:06<02:00,  1.57it/s]
 88%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Š | 1375/1563 [15:06<01:49,  1.72it/s]
 88%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Š | 1376/1563 [15:07<01:47,  1.75it/s]
 88%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Š | 1377/1563 [15:07<01:48,  1.72it/s]
 88%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Š | 1378/1563 [15:08<01:38,  1.88it/s]
 88%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Š | 1379/1563 [15:08<01:38,  1.87it/s]
 88%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Š | 1380/1563 [15:09<01:34,  1.94it/s]
 88%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Š | 1381/1563 [15:09<01:29,  2.04it/s]
 88%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Š | 1382/1563 [15:10<01:27,  2.07it/s]
 88%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Š | 1383/1563 [15:10<01:32,  1.95it/s]
 89%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Š | 1384/1563 [15:11<01:28,  2.03it/s]
 89%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Š | 1385/1563 [15:11<01:29,  1.99it/s]
 89%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Š | 1386/1563 [15:12<01:44,  1.70it/s]
 89%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Š | 1387/1563 [15:13<01:36,  1.83it/s]
 89%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‰ | 1388/1563 [15:13<01:50,  1.58it/s]
 89%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‰ | 1389/1563 [15:14<01:42,  1.70it/s]
 89%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‰ | 1390/1563 [15:14<01:38,  1.75it/s]
 89%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‰ | 1391/1563 [15:15<01:34,  1.82it/s]
 89%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‰ | 1392/1563 [15:15<01:31,  1.86it/s]
 89%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‰ | 1393/1563 [15:16<01:48,  1.57it/s]
 89%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‰ | 1394/1563 [15:17<01:51,  1.52it/s]
 89%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‰ | 1395/1563 [15:18<01:53,  1.48it/s]
 89%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‰ | 1396/1563 [15:19<02:01,  1.37it/s]
 89%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‰ | 1397/1563 [15:19<02:02,  1.35it/s]
 89%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‰ | 1398/1563 [15:20<02:08,  1.28it/s]
 90%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‰ | 1399/1563 [15:21<01:52,  1.46it/s]
 90%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‰ | 1400/1563 [15:21<01:40,  1.62it/s]
                                                   
{'loss': 0.1385, 'grad_norm': 0.73828125, 'learning_rate': 2.0985284708893156e-06, 'epoch': 0.9}

 90%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‰ | 1400/1563 [15:21<01:40,  1.62it/s]
 90%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‰ | 1401/1563 [15:22<01:33,  1.73it/s]
 90%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‰ | 1402/1563 [15:22<01:26,  1.86it/s]
 90%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‰ | 1403/1563 [15:23<01:22,  1.94it/s]
 90%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‰ | 1404/1563 [15:23<01:26,  1.85it/s]
 90%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‰ | 1405/1563 [15:24<01:40,  1.58it/s]
 90%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‰ | 1406/1563 [15:25<01:49,  1.43it/s]
 90%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆ | 1407/1563 [15:26<01:48,  1.44it/s]
 90%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆ | 1408/1563 [15:26<01:36,  1.60it/s]
 90%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆ | 1409/1563 [15:26<01:30,  1.70it/s]
 90%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆ | 1410/1563 [15:27<01:42,  1.49it/s]
 90%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆ | 1411/1563 [15:28<01:42,  1.49it/s]
 90%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆ | 1412/1563 [15:28<01:30,  1.68it/s]
 90%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆ | 1413/1563 [15:29<01:22,  1.82it/s]
 90%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆ | 1414/1563 [15:29<01:18,  1.91it/s]
 91%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆ | 1415/1563 [15:30<01:25,  1.72it/s]
 91%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆ | 1416/1563 [15:31<01:37,  1.51it/s]
 91%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆ | 1417/1563 [15:32<01:36,  1.52it/s]
 91%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆ | 1418/1563 [15:32<01:35,  1.52it/s]
 91%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆ | 1419/1563 [15:33<01:41,  1.42it/s]
 91%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆ | 1420/1563 [15:34<01:40,  1.42it/s]
 91%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆ | 1421/1563 [15:34<01:35,  1.49it/s]
 91%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆ | 1422/1563 [15:35<01:33,  1.51it/s]
 91%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆ | 1423/1563 [15:36<01:35,  1.47it/s]
 91%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆ | 1424/1563 [15:36<01:30,  1.53it/s]
 91%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆ | 1425/1563 [15:37<01:29,  1.55it/s]
 91%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆ | 1426/1563 [15:37<01:23,  1.63it/s]
 91%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–| 1427/1563 [15:38<01:32,  1.47it/s]
 91%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–| 1428/1563 [15:39<01:30,  1.48it/s]
 91%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–| 1429/1563 [15:40<01:35,  1.40it/s]
 91%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–| 1430/1563 [15:40<01:31,  1.45it/s]
 92%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–| 1431/1563 [15:41<01:28,  1.49it/s]
 92%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–| 1432/1563 [15:42<01:22,  1.58it/s]
 92%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–| 1433/1563 [15:42<01:26,  1.50it/s]
 92%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–| 1434/1563 [15:43<01:33,  1.39it/s]
 92%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–| 1435/1563 [15:44<01:38,  1.30it/s]
 92%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–| 1436/1563 [15:45<01:39,  1.27it/s]
 92%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–| 1437/1563 [15:46<01:42,  1.23it/s]
 92%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–| 1438/1563 [15:47<01:41,  1.23it/s]
 92%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–| 1439/1563 [15:47<01:35,  1.30it/s]
 92%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–| 1440/1563 [15:48<01:35,  1.29it/s]
 92%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–| 1441/1563 [15:49<01:37,  1.25it/s]
 92%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–| 1442/1563 [15:49<01:25,  1.41it/s]
 92%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–| 1443/1563 [15:50<01:30,  1.33it/s]
 92%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–| 1444/1563 [15:51<01:21,  1.46it/s]
 92%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–| 1445/1563 [15:51<01:16,  1.55it/s]
 93%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Ž| 1446/1563 [15:52<01:14,  1.56it/s]
 93%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Ž| 1447/1563 [15:52<01:12,  1.59it/s]
 93%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Ž| 1448/1563 [15:53<01:12,  1.59it/s]
 93%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Ž| 1449/1563 [15:54<01:06,  1.72it/s]
 93%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Ž| 1450/1563 [15:54<01:02,  1.80it/s]
                                                   
{'loss': 0.1388, 'grad_norm': 14.0625, 'learning_rate': 1.4587332053742803e-06, 'epoch': 0.93}

 93%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Ž| 1450/1563 [15:54<01:02,  1.80it/s]
 93%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Ž| 1451/1563 [15:55<01:05,  1.70it/s]
 93%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Ž| 1452/1563 [15:55<01:01,  1.79it/s]
 93%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Ž| 1453/1563 [15:56<01:07,  1.63it/s]
 93%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Ž| 1454/1563 [15:56<01:01,  1.76it/s]
 93%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Ž| 1455/1563 [15:57<01:04,  1.67it/s]
 93%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Ž| 1456/1563 [15:58<01:03,  1.69it/s]
 93%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Ž| 1457/1563 [15:58<01:05,  1.62it/s]
 93%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Ž| 1458/1563 [15:59<00:59,  1.77it/s]
 93%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Ž| 1459/1563 [15:59<00:54,  1.92it/s]
 93%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Ž| 1460/1563 [16:00<00:58,  1.76it/s]
 93%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Ž| 1461/1563 [16:01<01:04,  1.57it/s]
 94%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Ž| 1462/1563 [16:01<01:07,  1.49it/s]
 94%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Ž| 1463/1563 [16:02<01:12,  1.38it/s]
 94%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Ž| 1464/1563 [16:03<01:04,  1.54it/s]
 94%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Ž| 1465/1563 [16:03<01:03,  1.55it/s]
 94%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–| 1466/1563 [16:04<01:09,  1.40it/s]
 94%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–| 1467/1563 [16:05<01:09,  1.38it/s]
 94%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–| 1468/1563 [16:05<01:00,  1.56it/s]
 94%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–| 1469/1563 [16:06<00:54,  1.72it/s]
 94%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–| 1470/1563 [16:07<01:01,  1.52it/s]
 94%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–| 1471/1563 [16:07<01:00,  1.53it/s]
 94%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–| 1472/1563 [16:08<00:55,  1.64it/s]
 94%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–| 1473/1563 [16:08<00:52,  1.73it/s]
 94%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–| 1474/1563 [16:09<00:57,  1.54it/s]
 94%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–| 1475/1563 [16:10<00:52,  1.69it/s]
 94%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–| 1476/1563 [16:10<00:56,  1.53it/s]
 94%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–| 1477/1563 [16:11<00:53,  1.61it/s]
 95%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–| 1478/1563 [16:12<00:50,  1.68it/s]
 95%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–| 1479/1563 [16:12<00:47,  1.76it/s]
 95%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–| 1480/1563 [16:13<00:50,  1.64it/s]
 95%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–| 1481/1563 [16:14<00:54,  1.50it/s]
 95%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–| 1482/1563 [16:14<00:57,  1.40it/s]
 95%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–| 1483/1563 [16:15<00:56,  1.42it/s]
 95%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–| 1484/1563 [16:16<00:55,  1.43it/s]
 95%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Œ| 1485/1563 [16:17<00:58,  1.33it/s]
 95%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Œ| 1486/1563 [16:17<00:59,  1.29it/s]
 95%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Œ| 1487/1563 [16:18<00:58,  1.29it/s]
 95%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Œ| 1488/1563 [16:19<00:56,  1.33it/s]
 95%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Œ| 1489/1563 [16:20<00:54,  1.36it/s]
 95%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Œ| 1490/1563 [16:20<00:55,  1.31it/s]
 95%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Œ| 1491/1563 [16:21<00:48,  1.49it/s]
 95%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Œ| 1492/1563 [16:21<00:44,  1.59it/s]
 96%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Œ| 1493/1563 [16:22<00:41,  1.69it/s]
 96%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Œ| 1494/1563 [16:23<00:39,  1.74it/s]
 96%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Œ| 1495/1563 [16:23<00:39,  1.73it/s]
 96%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Œ| 1496/1563 [16:24<00:41,  1.60it/s]
 96%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Œ| 1497/1563 [16:25<00:43,  1.53it/s]
 96%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Œ| 1498/1563 [16:25<00:44,  1.46it/s]
 96%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Œ| 1499/1563 [16:26<00:39,  1.60it/s]
 96%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Œ| 1500/1563 [16:26<00:40,  1.57it/s]
                                                   
{'loss': 0.1387, 'grad_norm': 1.71875, 'learning_rate': 8.18937939859245e-07, 'epoch': 0.96}

 96%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Œ| 1500/1563 [16:27<00:40,  1.57it/s]
 96%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Œ| 1501/1563 [16:27<00:43,  1.41it/s]
 96%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Œ| 1502/1563 [16:28<00:39,  1.54it/s]
 96%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Œ| 1503/1563 [16:29<00:40,  1.49it/s]
 96%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Œ| 1504/1563 [16:29<00:40,  1.44it/s]
 96%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‹| 1505/1563 [16:30<00:39,  1.47it/s]
 96%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‹| 1506/1563 [16:31<00:41,  1.39it/s]
 96%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‹| 1507/1563 [16:31<00:36,  1.54it/s]
 96%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‹| 1508/1563 [16:32<00:39,  1.40it/s]
 97%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‹| 1509/1563 [16:33<00:40,  1.33it/s]
 97%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‹| 1510/1563 [16:33<00:35,  1.50it/s]
 97%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‹| 1511/1563 [16:34<00:36,  1.41it/s]
 97%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‹| 1512/1563 [16:35<00:33,  1.53it/s]
 97%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‹| 1513/1563 [16:35<00:29,  1.68it/s]
 97%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‹| 1514/1563 [16:36<00:27,  1.77it/s]
 97%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‹| 1515/1563 [16:36<00:28,  1.69it/s]
 97%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‹| 1516/1563 [16:37<00:26,  1.81it/s]
 97%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‹| 1517/1563 [16:38<00:29,  1.56it/s]
 97%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‹| 1518/1563 [16:39<00:31,  1.43it/s]
 97%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‹| 1519/1563 [16:39<00:27,  1.61it/s]
 97%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‹| 1520/1563 [16:40<00:29,  1.44it/s]
 97%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‹| 1521/1563 [16:40<00:26,  1.61it/s]
 97%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‹| 1522/1563 [16:41<00:28,  1.44it/s]
 97%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‹| 1523/1563 [16:42<00:28,  1.40it/s]
 98%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Š| 1524/1563 [16:43<00:29,  1.33it/s]
 98%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Š| 1525/1563 [16:43<00:24,  1.52it/s]
 98%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Š| 1526/1563 [16:44<00:21,  1.70it/s]
 98%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Š| 1527/1563 [16:44<00:21,  1.71it/s]
 98%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Š| 1528/1563 [16:45<00:22,  1.56it/s]
 98%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Š| 1529/1563 [16:45<00:20,  1.68it/s]
 98%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Š| 1530/1563 [16:46<00:19,  1.67it/s]
 98%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Š| 1531/1563 [16:47<00:18,  1.77it/s]
 98%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Š| 1532/1563 [16:47<00:18,  1.68it/s]
 98%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Š| 1533/1563 [16:48<00:18,  1.58it/s]
 98%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Š| 1534/1563 [16:49<00:20,  1.44it/s]
 98%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Š| 1535/1563 [16:49<00:17,  1.59it/s]
 98%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Š| 1536/1563 [16:50<00:18,  1.48it/s]
 98%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Š| 1537/1563 [16:51<00:20,  1.30it/s]
 98%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Š| 1538/1563 [16:52<00:19,  1.31it/s]
 98%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Š| 1539/1563 [16:52<00:17,  1.40it/s]
 99%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Š| 1540/1563 [16:53<00:15,  1.44it/s]
 99%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Š| 1541/1563 [16:54<00:14,  1.49it/s]
 99%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Š| 1542/1563 [16:54<00:12,  1.65it/s]
 99%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Š| 1543/1563 [16:55<00:11,  1.75it/s]
 99%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‰| 1544/1563 [16:55<00:10,  1.88it/s]
 99%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‰| 1545/1563 [16:56<00:10,  1.72it/s]
 99%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‰| 1546/1563 [16:56<00:10,  1.57it/s]
 99%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‰| 1547/1563 [16:57<00:10,  1.55it/s]
 99%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‰| 1548/1563 [16:58<00:08,  1.67it/s]
 99%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‰| 1549/1563 [16:58<00:07,  1.76it/s]
 99%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‰| 1550/1563 [16:59<00:08,  1.53it/s]
                                                   
{'loss': 0.1407, 'grad_norm': 2.4375, 'learning_rate': 1.7914267434420988e-07, 'epoch': 0.99}

 99%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‰| 1550/1563 [16:59<00:08,  1.53it/s]
 99%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‰| 1551/1563 [16:59<00:07,  1.69it/s]
 99%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‰| 1552/1563 [17:00<00:07,  1.49it/s]
 99%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‰| 1553/1563 [17:01<00:06,  1.48it/s]
 99%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‰| 1554/1563 [17:02<00:06,  1.37it/s]
 99%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‰| 1555/1563 [17:02<00:05,  1.45it/s]
100%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‰| 1556/1563 [17:03<00:04,  1.53it/s]
100%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‰| 1557/1563 [17:03<00:03,  1.66it/s]
100%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‰| 1558/1563 [17:04<00:03,  1.47it/s]
100%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‰| 1559/1563 [17:05<00:02,  1.47it/s]
100%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‰| 1560/1563 [17:06<00:02,  1.37it/s]
100%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‰| 1561/1563 [17:07<00:01,  1.36it/s]
100%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‰| 1562/1563 [17:07<00:00,  1.50it/s]
100%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆ| 1563/1563 [17:08<00:00,  1.51it/s]
                                                   
{'train_runtime': 1033.8148, 'train_samples_per_second': 193.458, 'train_steps_per_second': 1.512, 'train_loss': 0.2836286289449388, 'epoch': 1.0}

100%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆ| 1563/1563 [17:12<00:00,  1.51it/s]
100%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆ| 1563/1563 [17:12<00:00,  1.51it/s]

model.safetensors:   0%|          | 0.00/2.00G [00:00<?, ?B/s]

tokenizer.model:   0%|          | 0.00/4.69M [00:00<?, ?B/s]


training_args.bin:   0%|          | 0.00/5.43k [00:00<?, ?B/s]



Upload 3 LFS files:   0%|          | 0/3 [00:00<?, ?it/s]

tokenizer.model:   0%|          | 16.4k/4.69M [00:00<00:31, 147kB/s]
training_args.bin: 100%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆ| 5.43k/5.43k [00:00<00:00, 41.9kB/s]

model.safetensors:   0%|          | 2.59M/2.00G [00:00<01:57, 17.0MB/s]
tokenizer.model: 100%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆ| 4.69M/4.69M [00:00<00:00, 12.1MB/s]

model.safetensors:   1%|          | 16.0M/2.00G [00:00<01:07, 29.5MB/s]
model.safetensors:   2%|▏         | 32.0M/2.00G [00:00<00:45, 43.2MB/s]
model.safetensors:   2%|▏         | 48.0M/2.00G [00:01<00:38, 51.1MB/s]
model.safetensors:   3%|β–Ž         | 64.0M/2.00G [00:01<00:33, 58.4MB/s]
model.safetensors:   4%|▍         | 80.0M/2.00G [00:01<00:32, 59.3MB/s]
model.safetensors:   5%|▍         | 96.0M/2.00G [00:01<00:37, 51.0MB/s]
model.safetensors:   6%|β–Œ         | 112M/2.00G [00:02<00:34, 54.8MB/s] 
model.safetensors:   6%|β–‹         | 128M/2.00G [00:02<00:32, 58.1MB/s]
model.safetensors:   7%|β–‹         | 144M/2.00G [00:02<00:31, 59.2MB/s]
model.safetensors:   8%|β–Š         | 160M/2.00G [00:02<00:30, 61.0MB/s]
model.safetensors:   9%|β–‰         | 176M/2.00G [00:03<00:30, 60.5MB/s]
model.safetensors:  10%|β–‰         | 192M/2.00G [00:03<00:30, 59.4MB/s]
model.safetensors:  10%|β–ˆ         | 208M/2.00G [00:03<00:28, 62.6MB/s]
model.safetensors:  11%|β–ˆ         | 224M/2.00G [00:03<00:28, 61.8MB/s]
model.safetensors:  12%|β–ˆβ–        | 240M/2.00G [00:04<00:27, 63.4MB/s]
model.safetensors:  13%|β–ˆβ–Ž        | 256M/2.00G [00:04<00:27, 63.9MB/s]
model.safetensors:  14%|β–ˆβ–Ž        | 272M/2.00G [00:04<00:28, 59.9MB/s]
model.safetensors:  14%|β–ˆβ–        | 288M/2.00G [00:05<00:28, 59.3MB/s]
model.safetensors:  15%|β–ˆβ–Œ        | 304M/2.00G [00:05<00:30, 55.6MB/s]
model.safetensors:  16%|β–ˆβ–Œ        | 320M/2.00G [00:05<00:30, 55.2MB/s]
model.safetensors:  17%|β–ˆβ–‹        | 336M/2.00G [00:05<00:28, 58.1MB/s]
model.safetensors:  18%|β–ˆβ–Š        | 352M/2.00G [00:06<00:27, 59.5MB/s]
model.safetensors:  18%|β–ˆβ–Š        | 368M/2.00G [00:06<00:26, 61.6MB/s]
model.safetensors:  19%|β–ˆβ–‰        | 384M/2.00G [00:06<00:26, 61.9MB/s]
model.safetensors:  20%|β–ˆβ–ˆ        | 400M/2.00G [00:06<00:26, 60.8MB/s]
model.safetensors:  21%|β–ˆβ–ˆ        | 416M/2.00G [00:07<00:26, 59.2MB/s]
model.safetensors:  22%|β–ˆβ–ˆβ–       | 432M/2.00G [00:07<00:25, 61.6MB/s]
model.safetensors:  22%|β–ˆβ–ˆβ–       | 448M/2.00G [00:07<00:24, 62.3MB/s]
model.safetensors:  23%|β–ˆβ–ˆβ–Ž       | 464M/2.00G [00:07<00:22, 67.0MB/s]
model.safetensors:  24%|β–ˆβ–ˆβ–       | 480M/2.00G [00:08<00:23, 63.9MB/s]
model.safetensors:  25%|β–ˆβ–ˆβ–       | 496M/2.00G [00:08<00:23, 63.5MB/s]
model.safetensors:  26%|β–ˆβ–ˆβ–Œ       | 512M/2.00G [00:08<00:30, 48.7MB/s]
model.safetensors:  26%|β–ˆβ–ˆβ–‹       | 528M/2.00G [00:09<00:28, 51.2MB/s]
model.safetensors:  27%|β–ˆβ–ˆβ–‹       | 544M/2.00G [00:09<00:27, 52.4MB/s]
model.safetensors:  28%|β–ˆβ–ˆβ–Š       | 560M/2.00G [00:09<00:25, 55.5MB/s]
model.safetensors:  29%|β–ˆβ–ˆβ–‰       | 576M/2.00G [00:10<00:25, 56.4MB/s]
model.safetensors:  30%|β–ˆβ–ˆβ–‰       | 592M/2.00G [00:10<00:24, 57.8MB/s]
model.safetensors:  30%|β–ˆβ–ˆβ–ˆ       | 608M/2.00G [00:10<00:22, 60.9MB/s]
model.safetensors:  31%|β–ˆβ–ˆβ–ˆ       | 624M/2.00G [00:10<00:23, 59.7MB/s]
model.safetensors:  32%|β–ˆβ–ˆβ–ˆβ–      | 640M/2.00G [00:11<00:21, 63.3MB/s]
model.safetensors:  33%|β–ˆβ–ˆβ–ˆβ–Ž      | 656M/2.00G [00:11<00:21, 63.3MB/s]
model.safetensors:  34%|β–ˆβ–ˆβ–ˆβ–Ž      | 672M/2.00G [00:11<00:21, 61.4MB/s]
model.safetensors:  34%|β–ˆβ–ˆβ–ˆβ–      | 688M/2.00G [00:11<00:20, 63.6MB/s]
model.safetensors:  35%|β–ˆβ–ˆβ–ˆβ–Œ      | 704M/2.00G [00:12<00:20, 63.2MB/s]
model.safetensors:  36%|β–ˆβ–ˆβ–ˆβ–Œ      | 720M/2.00G [00:12<00:20, 63.7MB/s]
model.safetensors:  37%|β–ˆβ–ˆβ–ˆβ–‹      | 736M/2.00G [00:12<00:19, 64.2MB/s]
model.safetensors:  38%|β–ˆβ–ˆβ–ˆβ–Š      | 752M/2.00G [00:12<00:19, 65.5MB/s]
model.safetensors:  38%|β–ˆβ–ˆβ–ˆβ–Š      | 768M/2.00G [00:13<00:19, 62.7MB/s]
model.safetensors:  39%|β–ˆβ–ˆβ–ˆβ–‰      | 784M/2.00G [00:13<00:18, 64.2MB/s]
model.safetensors:  40%|β–ˆβ–ˆβ–ˆβ–ˆ      | 800M/2.00G [00:13<00:18, 64.1MB/s]
model.safetensors:  41%|β–ˆβ–ˆβ–ˆβ–ˆ      | 816M/2.00G [00:13<00:19, 60.3MB/s]
model.safetensors:  42%|β–ˆβ–ˆβ–ˆβ–ˆβ–     | 832M/2.00G [00:14<00:19, 60.3MB/s]
model.safetensors:  42%|β–ˆβ–ˆβ–ˆβ–ˆβ–     | 848M/2.00G [00:14<00:19, 59.6MB/s]
model.safetensors:  43%|β–ˆβ–ˆβ–ˆβ–ˆβ–Ž     | 864M/2.00G [00:14<00:18, 61.2MB/s]
model.safetensors:  44%|β–ˆβ–ˆβ–ˆβ–ˆβ–     | 880M/2.00G [00:14<00:18, 62.0MB/s]
model.safetensors:  45%|β–ˆβ–ˆβ–ˆβ–ˆβ–     | 896M/2.00G [00:15<00:18, 60.1MB/s]
model.safetensors:  46%|β–ˆβ–ˆβ–ˆβ–ˆβ–Œ     | 912M/2.00G [00:15<00:19, 55.9MB/s]
model.safetensors:  46%|β–ˆβ–ˆβ–ˆβ–ˆβ–‹     | 928M/2.00G [00:15<00:17, 61.9MB/s]
model.safetensors:  47%|β–ˆβ–ˆβ–ˆβ–ˆβ–‹     | 944M/2.00G [00:15<00:16, 64.2MB/s]
model.safetensors:  48%|β–ˆβ–ˆβ–ˆβ–ˆβ–Š     | 960M/2.00G [00:16<00:15, 65.8MB/s]
model.safetensors:  49%|β–ˆβ–ˆβ–ˆβ–ˆβ–‰     | 976M/2.00G [00:16<00:17, 59.3MB/s]
model.safetensors:  50%|β–ˆβ–ˆβ–ˆβ–ˆβ–‰     | 992M/2.00G [00:16<00:17, 56.2MB/s]
model.safetensors:  50%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆ     | 1.01G/2.00G [00:17<00:17, 56.3MB/s]
model.safetensors:  51%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆ     | 1.02G/2.00G [00:17<00:16, 59.3MB/s]
model.safetensors:  52%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–    | 1.04G/2.00G [00:17<00:15, 62.5MB/s]
model.safetensors:  53%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Ž    | 1.06G/2.00G [00:17<00:15, 60.8MB/s]
model.safetensors:  54%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Ž    | 1.07G/2.00G [00:18<00:16, 57.2MB/s]
model.safetensors:  54%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–    | 1.09G/2.00G [00:18<00:14, 61.8MB/s]
model.safetensors:  55%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Œ    | 1.10G/2.00G [00:18<00:13, 64.4MB/s]
model.safetensors:  56%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Œ    | 1.12G/2.00G [00:18<00:14, 62.0MB/s]
model.safetensors:  57%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‹    | 1.14G/2.00G [00:19<00:14, 60.2MB/s]
model.safetensors:  58%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Š    | 1.15G/2.00G [00:19<00:13, 62.4MB/s]
model.safetensors:  58%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Š    | 1.17G/2.00G [00:19<00:14, 59.0MB/s]
model.safetensors:  59%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‰    | 1.18G/2.00G [00:19<00:13, 60.8MB/s]
model.safetensors:  60%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆ    | 1.20G/2.00G [00:20<00:13, 58.5MB/s]
model.safetensors:  61%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆ    | 1.22G/2.00G [00:20<00:13, 59.7MB/s]
model.safetensors:  62%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–   | 1.23G/2.00G [00:20<00:12, 60.3MB/s]
model.safetensors:  62%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–   | 1.25G/2.00G [00:20<00:12, 59.7MB/s]
model.safetensors:  63%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Ž   | 1.26G/2.00G [00:21<00:12, 61.1MB/s]
model.safetensors:  64%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–   | 1.28G/2.00G [00:21<00:12, 58.0MB/s]
model.safetensors:  65%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–   | 1.30G/2.00G [00:21<00:12, 58.4MB/s]
model.safetensors:  66%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Œ   | 1.31G/2.00G [00:22<00:12, 53.0MB/s]
model.safetensors:  66%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‹   | 1.33G/2.00G [00:22<00:12, 55.1MB/s]
model.safetensors:  67%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‹   | 1.34G/2.00G [00:22<00:11, 54.7MB/s]
model.safetensors:  68%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Š   | 1.36G/2.00G [00:22<00:10, 61.2MB/s]
model.safetensors:  69%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‰   | 1.38G/2.00G [00:23<00:10, 60.3MB/s]
model.safetensors:  70%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‰   | 1.39G/2.00G [00:23<00:11, 53.5MB/s]
model.safetensors:  70%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆ   | 1.41G/2.00G [00:23<00:10, 55.6MB/s]
model.safetensors:  71%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆ   | 1.42G/2.00G [00:24<00:09, 61.4MB/s]
model.safetensors:  72%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–  | 1.44G/2.00G [00:24<00:08, 66.8MB/s]
model.safetensors:  73%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Ž  | 1.46G/2.00G [00:24<00:08, 65.9MB/s]
model.safetensors:  74%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Ž  | 1.47G/2.00G [00:24<00:08, 61.6MB/s]
model.safetensors:  74%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–  | 1.49G/2.00G [00:25<00:08, 61.6MB/s]
model.safetensors:  75%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Œ  | 1.50G/2.00G [00:25<00:08, 60.4MB/s]
model.safetensors:  76%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Œ  | 1.52G/2.00G [00:25<00:07, 66.0MB/s]
model.safetensors:  77%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‹  | 1.54G/2.00G [00:25<00:07, 63.2MB/s]
model.safetensors:  78%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Š  | 1.55G/2.00G [00:26<00:07, 61.7MB/s]
model.safetensors:  78%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Š  | 1.57G/2.00G [00:26<00:06, 63.3MB/s]
model.safetensors:  79%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‰  | 1.58G/2.00G [00:26<00:06, 64.1MB/s]
model.safetensors:  80%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆ  | 1.60G/2.00G [00:26<00:06, 63.7MB/s]
model.safetensors:  81%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆ  | 1.62G/2.00G [00:27<00:05, 66.3MB/s]
model.safetensors:  82%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ– | 1.63G/2.00G [00:27<00:05, 68.0MB/s]
model.safetensors:  82%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ– | 1.65G/2.00G [00:27<00:05, 63.4MB/s]
model.safetensors:  83%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Ž | 1.66G/2.00G [00:27<00:05, 60.8MB/s]
model.safetensors:  84%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ– | 1.68G/2.00G [00:28<00:05, 61.9MB/s]
model.safetensors:  85%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ– | 1.70G/2.00G [00:28<00:04, 62.6MB/s]
model.safetensors:  86%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Œ | 1.71G/2.00G [00:28<00:05, 55.2MB/s]
model.safetensors:  86%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‹ | 1.73G/2.00G [00:29<00:07, 34.8MB/s]
model.safetensors:  87%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‹ | 1.74G/2.00G [00:29<00:06, 41.9MB/s]
model.safetensors:  88%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Š | 1.76G/2.00G [00:30<00:05, 45.0MB/s]
model.safetensors:  89%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‰ | 1.78G/2.00G [00:30<00:04, 47.8MB/s]
model.safetensors:  90%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‰ | 1.79G/2.00G [00:30<00:04, 51.1MB/s]
model.safetensors:  90%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆ | 1.81G/2.00G [00:30<00:03, 52.9MB/s]
model.safetensors:  91%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆ | 1.82G/2.00G [00:31<00:03, 56.7MB/s]
model.safetensors:  92%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–| 1.84G/2.00G [00:31<00:02, 60.8MB/s]
model.safetensors:  93%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Ž| 1.86G/2.00G [00:31<00:02, 60.5MB/s]
model.safetensors:  94%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Ž| 1.87G/2.00G [00:31<00:02, 61.4MB/s]
model.safetensors:  94%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–| 1.89G/2.00G [00:32<00:01, 61.1MB/s]
model.safetensors:  95%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Œ| 1.90G/2.00G [00:32<00:01, 61.6MB/s]
model.safetensors:  96%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Œ| 1.92G/2.00G [00:32<00:01, 64.1MB/s]
model.safetensors:  97%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‹| 1.94G/2.00G [00:32<00:00, 68.6MB/s]
model.safetensors:  98%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Š| 1.95G/2.00G [00:33<00:00, 66.4MB/s]
model.safetensors:  98%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Š| 1.97G/2.00G [00:33<00:00, 67.9MB/s]
model.safetensors:  99%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‰| 1.98G/2.00G [00:33<00:00, 66.5MB/s]
model.safetensors: 100%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆ| 2.00G/2.00G [00:33<00:00, 59.2MB/s]




Upload 3 LFS files:  33%|β–ˆβ–ˆβ–ˆβ–Ž      | 1/3 [00:34<01:08, 34.03s/it]
Upload 3 LFS files: 100%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆ| 3/3 [00:34<00:00, 11.34s/it]