YWZBrandon's picture
End of training
8c97474 verified
[2025-05-14 21:43:37] Created output directory: train_results_pred_mask/google_gemma-3-1b-pt_qa_ds100_upsample1000
[2025-05-14 21:43:37] Chat mode disabled
[2025-05-14 21:43:37] Model size is 3B or smaller (1 B). Using full fine-tuning.
[2025-05-14 21:43:37] No QA format data will be used
[2025-05-14 21:43:37] Limiting dataset size to: 100 samples
[2025-05-14 21:43:37] =======================================
[2025-05-14 21:43:37] Starting training for model: google/gemma-3-1b-pt
[2025-05-14 21:43:37] =======================================
[2025-05-14 21:43:37] CUDA_VISIBLE_DEVICES: 0,1,2,3
[2025-05-14 21:43:37] WANDB_PROJECT: wikidyk-ar
[2025-05-14 21:43:37] DATA_PATH: data/wikidyk2022-2025_01082025_gpt-4o_evalv2_pages_formatted_combined_v2_trainqas.json
[2025-05-14 21:43:37] Global Batch Size: 128
[2025-05-14 21:43:37] Data Size: 100
[2025-05-14 21:43:37] Executing command: torchrun --nproc_per_node "4" --master-port 29581 src/train.py --model_name_or_path "google/gemma-3-1b-pt" --data_path "data/wikidyk2022-2025_01082025_gpt-4o_evalv2_pages_formatted_combined_v2_trainqas.json" --output_dir "train_results_pred_mask/google_gemma-3-1b-pt_qa_ds100_upsample1000" --num_upsample "1000" --per_device_train_batch_size "32" --gradient_accumulation_steps "1" --learning_rate "2e-5" --num_train_epochs "1" --model_max_length "32768" --report_to wandb --logging_steps 50 --save_strategy steps --save_steps 10000 --save_total_limit 3 --resume_from_checkpoint True --bf16 True --use_flash_attention_2 True --qa_data_ratio "-1" --predict_mask "false" --ds_size 100
[2025-05-14 21:43:37] Training started at Wed May 14 21:43:37 UTC 2025
W0514 21:43:38.845000 618618 site-packages/torch/distributed/run.py:792]
W0514 21:43:38.845000 618618 site-packages/torch/distributed/run.py:792] *****************************************
W0514 21:43:38.845000 618618 site-packages/torch/distributed/run.py:792] Setting OMP_NUM_THREADS environment variable for each process to be 1 in default, to avoid your system being overloaded, please further tune the variable for optimal performance in your application as needed.
W0514 21:43:38.845000 618618 site-packages/torch/distributed/run.py:792] *****************************************
WARNING:root:Output directory: train_results_pred_mask/google_gemma-3-1b-pt_qa_ds100_upsample1000
The model was loaded with use_flash_attention_2=True, which is deprecated and may be removed in a future release. Please use `attn_implementation="flash_attention_2"` instead.
You are attempting to use Flash Attention 2.0 with a model not initialized on GPU. Make sure to move the model to GPU after initializing it on CPU with `model.to('cuda')`.
WARNING:root:Output directory: train_results_pred_mask/google_gemma-3-1b-pt_qa_ds100_upsample1000
The model was loaded with use_flash_attention_2=True, which is deprecated and may be removed in a future release. Please use `attn_implementation="flash_attention_2"` instead.
You are attempting to use Flash Attention 2.0 with a model not initialized on GPU. Make sure to move the model to GPU after initializing it on CPU with `model.to('cuda')`.
WARNING:root:Output directory: train_results_pred_mask/google_gemma-3-1b-pt_qa_ds100_upsample1000
WARNING:root:Output directory: train_results_pred_mask/google_gemma-3-1b-pt_qa_ds100_upsample1000
The model was loaded with use_flash_attention_2=True, which is deprecated and may be removed in a future release. Please use `attn_implementation="flash_attention_2"` instead.
You are attempting to use Flash Attention 2.0 with a model not initialized on GPU. Make sure to move the model to GPU after initializing it on CPU with `model.to('cuda')`.
WARNING:root:Loading data...
The model was loaded with use_flash_attention_2=True, which is deprecated and may be removed in a future release. Please use `attn_implementation="flash_attention_2"` instead.
You are attempting to use Flash Attention 2.0 with a model not initialized on GPU. Make sure to move the model to GPU after initializing it on CPU with `model.to('cuda')`.
WARNING:root:Loading data...
WARNING:root:Loading data...
WARNING:root:Loading data...
WARNING:root:Dataset initialized with all QA data:
WARNING:root: - 100000 QA examples
WARNING:root: - 100 fact examples with upsampling factor 1000
WARNING:root: - Total examples: 200000
/root/yuwei/WikiDYKEvalV2/src/train.py:119: FutureWarning: `tokenizer` is deprecated and will be removed in version 5.0.0 for `Trainer.__init__`. Use `processing_class` instead.
trainer = Trainer(model=model, tokenizer=tokenizer, args=training_args, **data_module)
WARNING:root:Dataset initialized with all QA data:
WARNING:root: - 100000 QA examples
WARNING:root: - 100 fact examples with upsampling factor 1000
WARNING:root: - Total examples: 200000
/root/yuwei/WikiDYKEvalV2/src/train.py:119: FutureWarning: `tokenizer` is deprecated and will be removed in version 5.0.0 for `Trainer.__init__`. Use `processing_class` instead.
trainer = Trainer(model=model, tokenizer=tokenizer, args=training_args, **data_module)
WARNING:root:Dataset initialized with all QA data:
WARNING:root: - 100000 QA examples
WARNING:root: - 100 fact examples with upsampling factor 1000
WARNING:root: - Total examples: 200000
/root/yuwei/WikiDYKEvalV2/src/train.py:119: FutureWarning: `tokenizer` is deprecated and will be removed in version 5.0.0 for `Trainer.__init__`. Use `processing_class` instead.
trainer = Trainer(model=model, tokenizer=tokenizer, args=training_args, **data_module)
WARNING:root:Dataset initialized with all QA data:
WARNING:root: - 100000 QA examples
WARNING:root: - 100 fact examples with upsampling factor 1000
WARNING:root: - Total examples: 200000
/root/yuwei/WikiDYKEvalV2/src/train.py:119: FutureWarning: `tokenizer` is deprecated and will be removed in version 5.0.0 for `Trainer.__init__`. Use `processing_class` instead.
trainer = Trainer(model=model, tokenizer=tokenizer, args=training_args, **data_module)
Checkpoint missing; starting training from scratch
Checkpoint missing; starting training from scratch
Checkpoint missing; starting training from scratch
Checkpoint missing; starting training from scratch
wandb: WARNING The `run_name` is currently set to the same value as `TrainingArguments.output_dir`. If this was not intended, please specify a different run name by setting the `TrainingArguments.run_name` parameter.
It is strongly recommended to train Gemma3 models with the `eager` attention implementation instead of `flash_attention_2`. Use `eager` with `AutoModelForCausalLM.from_pretrained('<path-to-checkpoint>', attn_implementation='eager')`.
It is strongly recommended to train Gemma3 models with the `eager` attention implementation instead of `flash_attention_2`. Use `eager` with `AutoModelForCausalLM.from_pretrained('<path-to-checkpoint>', attn_implementation='eager')`.
It is strongly recommended to train Gemma3 models with the `eager` attention implementation instead of `flash_attention_2`. Use `eager` with `AutoModelForCausalLM.from_pretrained('<path-to-checkpoint>', attn_implementation='eager')`.
wandb: Currently logged in as: yuweiz to https://api.wandb.ai. Use `wandb login --relogin` to force relogin
wandb: Tracking run with wandb version 0.19.11
wandb: Run data is saved locally in /root/yuwei/WikiDYKEvalV2/wandb/run-20250514_214351-thkr8ndb
wandb: Run `wandb offline` to turn off syncing.
wandb: Syncing run train_results_pred_mask/google_gemma-3-1b-pt_qa_ds100_upsample1000
wandb: ⭐️ View project at https://wandb.ai/yuweiz/wikidyk-ar
wandb: πŸš€ View run at https://wandb.ai/yuweiz/wikidyk-ar/runs/thkr8ndb
0%| | 0/1563 [00:00<?, ?it/s]It is strongly recommended to train Gemma3 models with the `eager` attention implementation instead of `flash_attention_2`. Use `eager` with `AutoModelForCausalLM.from_pretrained('<path-to-checkpoint>', attn_implementation='eager')`.
[rank2]:[W514 21:43:53.328500884 reducer.cpp:1400] Warning: find_unused_parameters=True was specified in DDP constructor, but did not find any unused parameters in the forward pass. This flag results in an extra traversal of the autograd graph every iteration, which can adversely affect performance. If your model indeed never has any unused parameters in the forward pass, consider turning this flag off. Note that this warning may be a false positive if your model has flow control causing later iterations to have unused parameters. (function operator())
[rank1]:[W514 21:43:53.333029675 reducer.cpp:1400] Warning: find_unused_parameters=True was specified in DDP constructor, but did not find any unused parameters in the forward pass. This flag results in an extra traversal of the autograd graph every iteration, which can adversely affect performance. If your model indeed never has any unused parameters in the forward pass, consider turning this flag off. Note that this warning may be a false positive if your model has flow control causing later iterations to have unused parameters. (function operator())
[rank3]:[W514 21:43:53.336456929 reducer.cpp:1400] Warning: find_unused_parameters=True was specified in DDP constructor, but did not find any unused parameters in the forward pass. This flag results in an extra traversal of the autograd graph every iteration, which can adversely affect performance. If your model indeed never has any unused parameters in the forward pass, consider turning this flag off. Note that this warning may be a false positive if your model has flow control causing later iterations to have unused parameters. (function operator())
[rank0]:[W514 21:43:53.339178719 reducer.cpp:1400] Warning: find_unused_parameters=True was specified in DDP constructor, but did not find any unused parameters in the forward pass. This flag results in an extra traversal of the autograd graph every iteration, which can adversely affect performance. If your model indeed never has any unused parameters in the forward pass, consider turning this flag off. Note that this warning may be a false positive if your model has flow control causing later iterations to have unused parameters. (function operator())
0%| | 1/1563 [00:02<58:53, 2.26s/it] 0%| | 2/1563 [00:02<31:34, 1.21s/it] 0%| | 3/1563 [00:03<25:28, 1.02it/s] 0%| | 4/1563 [00:04<29:51, 1.15s/it] 0%| | 5/1563 [00:05<23:51, 1.09it/s] 0%| | 6/1563 [00:06<23:17, 1.11it/s] 0%| | 7/1563 [00:06<21:42, 1.19it/s] 1%| | 8/1563 [00:07<22:32, 1.15it/s] 1%| | 9/1563 [00:08<21:40, 1.19it/s] 1%| | 10/1563 [00:09<18:39, 1.39it/s] 1%| | 11/1563 [00:09<16:43, 1.55it/s] 1%| | 12/1563 [00:10<18:18, 1.41it/s] 1%| | 13/1563 [00:10<17:12, 1.50it/s] 1%| | 14/1563 [00:11<17:28, 1.48it/s] 1%| | 15/1563 [00:12<17:57, 1.44it/s] 1%| | 16/1563 [00:13<17:49, 1.45it/s] 1%| | 17/1563 [00:13<17:00, 1.52it/s] 1%| | 18/1563 [00:14<15:29, 1.66it/s] 1%| | 19/1563 [00:14<17:11, 1.50it/s] 1%|▏ | 20/1563 [00:15<16:15, 1.58it/s] 1%|▏ | 21/1563 [00:16<16:34, 1.55it/s] 1%|▏ | 22/1563 [00:17<22:29, 1.14it/s] 1%|▏ | 23/1563 [00:18<20:43, 1.24it/s] 2%|▏ | 24/1563 [00:18<18:00, 1.42it/s] 2%|▏ | 25/1563 [00:19<17:10, 1.49it/s] 2%|▏ | 26/1563 [00:19<15:25, 1.66it/s] 2%|▏ | 27/1563 [00:20<17:09, 1.49it/s] 2%|▏ | 28/1563 [00:21<17:14, 1.48it/s] 2%|▏ | 29/1563 [00:21<16:29, 1.55it/s] 2%|▏ | 30/1563 [00:22<15:06, 1.69it/s] 2%|▏ | 31/1563 [00:22<14:10, 1.80it/s] 2%|▏ | 32/1563 [00:23<16:17, 1.57it/s] 2%|▏ | 33/1563 [00:24<20:09, 1.26it/s] 2%|▏ | 34/1563 [00:25<17:39, 1.44it/s] 2%|▏ | 35/1563 [00:25<15:57, 1.60it/s] 2%|▏ | 36/1563 [00:26<15:51, 1.60it/s] 2%|▏ | 37/1563 [00:26<15:29, 1.64it/s] 2%|▏ | 38/1563 [00:27<16:19, 1.56it/s] 2%|▏ | 39/1563 [00:28<16:26, 1.54it/s] 3%|β–Ž | 40/1563 [00:28<16:19, 1.56it/s] 3%|β–Ž | 41/1563 [00:29<15:50, 1.60it/s] 3%|β–Ž | 42/1563 [00:30<16:59, 1.49it/s] 3%|β–Ž | 43/1563 [00:30<17:21, 1.46it/s] 3%|β–Ž | 44/1563 [00:31<18:16, 1.38it/s] 3%|β–Ž | 45/1563 [00:32<17:32, 1.44it/s] 3%|β–Ž | 46/1563 [00:32<15:54, 1.59it/s] 3%|β–Ž | 47/1563 [00:33<15:21, 1.65it/s] 3%|β–Ž | 48/1563 [00:33<14:43, 1.71it/s] 3%|β–Ž | 49/1563 [00:34<15:37, 1.62it/s] 3%|β–Ž | 50/1563 [00:35<16:50, 1.50it/s] {'loss': 3.5671, 'grad_norm': 996.0, 'learning_rate': 1.9373000639795267e-05, 'epoch': 0.03}
3%|β–Ž | 50/1563 [00:35<16:50, 1.50it/s] 3%|β–Ž | 51/1563 [00:36<17:52, 1.41it/s] 3%|β–Ž | 52/1563 [00:36<16:09, 1.56it/s] 3%|β–Ž | 53/1563 [00:37<17:24, 1.45it/s] 3%|β–Ž | 54/1563 [00:38<18:36, 1.35it/s] 4%|β–Ž | 55/1563 [00:39<19:03, 1.32it/s] 4%|β–Ž | 56/1563 [00:39<17:03, 1.47it/s] 4%|β–Ž | 57/1563 [00:41<21:41, 1.16it/s] 4%|β–Ž | 58/1563 [00:41<20:19, 1.23it/s] 4%|▍ | 59/1563 [00:42<17:54, 1.40it/s] 4%|▍ | 60/1563 [00:42<16:05, 1.56it/s] 4%|▍ | 61/1563 [00:43<17:34, 1.42it/s] 4%|▍ | 62/1563 [00:44<18:03, 1.38it/s] 4%|▍ | 63/1563 [00:44<17:18, 1.44it/s] 4%|▍ | 64/1563 [00:45<17:29, 1.43it/s] 4%|▍ | 65/1563 [00:46<16:14, 1.54it/s] 4%|▍ | 66/1563 [00:46<16:43, 1.49it/s] 4%|▍ | 67/1563 [00:47<15:01, 1.66it/s] 4%|▍ | 68/1563 [00:47<14:43, 1.69it/s] 4%|▍ | 69/1563 [00:48<13:42, 1.82it/s] 4%|▍ | 70/1563 [00:48<14:28, 1.72it/s] 5%|▍ | 71/1563 [00:49<16:06, 1.54it/s] 5%|▍ | 72/1563 [00:50<15:00, 1.66it/s] 5%|▍ | 73/1563 [00:50<15:14, 1.63it/s] 5%|▍ | 74/1563 [00:51<16:11, 1.53it/s] 5%|▍ | 75/1563 [00:52<16:13, 1.53it/s] 5%|▍ | 76/1563 [00:53<16:27, 1.51it/s] 5%|▍ | 77/1563 [00:53<14:50, 1.67it/s] 5%|▍ | 78/1563 [00:54<15:09, 1.63it/s] 5%|β–Œ | 79/1563 [00:54<13:55, 1.78it/s] 5%|β–Œ | 80/1563 [00:55<15:26, 1.60it/s] 5%|β–Œ | 81/1563 [00:56<15:48, 1.56it/s] 5%|β–Œ | 82/1563 [00:56<16:44, 1.47it/s] 5%|β–Œ | 83/1563 [00:57<15:32, 1.59it/s] 5%|β–Œ | 84/1563 [00:57<14:18, 1.72it/s] 5%|β–Œ | 85/1563 [00:58<16:08, 1.53it/s] 6%|β–Œ | 86/1563 [00:59<16:27, 1.50it/s] 6%|β–Œ | 87/1563 [00:59<14:37, 1.68it/s] 6%|β–Œ | 88/1563 [01:00<14:22, 1.71it/s] 6%|β–Œ | 89/1563 [01:01<16:11, 1.52it/s] 6%|β–Œ | 90/1563 [01:01<14:48, 1.66it/s] 6%|β–Œ | 91/1563 [01:02<16:03, 1.53it/s] 6%|β–Œ | 92/1563 [01:03<17:37, 1.39it/s] 6%|β–Œ | 93/1563 [01:03<16:19, 1.50it/s] 6%|β–Œ | 94/1563 [01:04<14:42, 1.66it/s] 6%|β–Œ | 95/1563 [01:05<17:03, 1.43it/s] 6%|β–Œ | 96/1563 [01:05<16:24, 1.49it/s] 6%|β–Œ | 97/1563 [01:06<16:37, 1.47it/s] 6%|β–‹ | 98/1563 [01:07<17:51, 1.37it/s] 6%|β–‹ | 99/1563 [01:07<16:22, 1.49it/s] 6%|β–‹ | 100/1563 [01:08<16:51, 1.45it/s] {'loss': 0.7259, 'grad_norm': 15.6875, 'learning_rate': 1.8733205374280233e-05, 'epoch': 0.06}
6%|β–‹ | 100/1563 [01:08<16:51, 1.45it/s] 6%|β–‹ | 101/1563 [01:09<16:29, 1.48it/s] 7%|β–‹ | 102/1563 [01:09<16:37, 1.47it/s] 7%|β–‹ | 103/1563 [01:10<17:50, 1.36it/s] 7%|β–‹ | 104/1563 [01:11<16:26, 1.48it/s] 7%|β–‹ | 105/1563 [01:11<16:29, 1.47it/s] 7%|β–‹ | 106/1563 [01:12<14:43, 1.65it/s] 7%|β–‹ | 107/1563 [01:13<16:21, 1.48it/s] 7%|β–‹ | 108/1563 [01:13<15:55, 1.52it/s] 7%|β–‹ | 109/1563 [01:14<16:31, 1.47it/s] 7%|β–‹ | 110/1563 [01:15<16:16, 1.49it/s] 7%|β–‹ | 111/1563 [01:15<14:37, 1.65it/s] 7%|β–‹ | 112/1563 [01:16<15:08, 1.60it/s] 7%|β–‹ | 113/1563 [01:16<14:23, 1.68it/s] 7%|β–‹ | 114/1563 [01:17<14:49, 1.63it/s] 7%|β–‹ | 115/1563 [01:18<13:39, 1.77it/s] 7%|β–‹ | 116/1563 [01:18<15:36, 1.55it/s] 7%|β–‹ | 117/1563 [01:19<14:38, 1.65it/s] 8%|β–Š | 118/1563 [01:19<14:31, 1.66it/s] 8%|β–Š | 119/1563 [01:20<13:54, 1.73it/s] 8%|β–Š | 120/1563 [01:20<12:50, 1.87it/s] 8%|β–Š | 121/1563 [01:21<14:24, 1.67it/s] 8%|β–Š | 122/1563 [01:22<13:48, 1.74it/s] 8%|β–Š | 123/1563 [01:23<15:53, 1.51it/s] 8%|β–Š | 124/1563 [01:23<14:52, 1.61it/s] 8%|β–Š | 125/1563 [01:24<15:52, 1.51it/s] 8%|β–Š | 126/1563 [01:25<17:03, 1.40it/s] 8%|β–Š | 127/1563 [01:26<18:01, 1.33it/s] 8%|β–Š | 128/1563 [01:26<18:37, 1.28it/s] 8%|β–Š | 129/1563 [01:27<16:26, 1.45it/s] 8%|β–Š | 130/1563 [01:28<16:22, 1.46it/s] 8%|β–Š | 131/1563 [01:28<17:15, 1.38it/s] 8%|β–Š | 132/1563 [01:29<15:33, 1.53it/s] 9%|β–Š | 133/1563 [01:29<14:53, 1.60it/s] 9%|β–Š | 134/1563 [01:30<16:31, 1.44it/s] 9%|β–Š | 135/1563 [01:31<16:27, 1.45it/s] 9%|β–Š | 136/1563 [01:32<16:54, 1.41it/s] 9%|β–‰ | 137/1563 [01:32<15:07, 1.57it/s] 9%|β–‰ | 138/1563 [01:33<16:33, 1.43it/s] 9%|β–‰ | 139/1563 [01:33<14:51, 1.60it/s] 9%|β–‰ | 140/1563 [01:34<15:38, 1.52it/s] 9%|β–‰ | 141/1563 [01:35<15:49, 1.50it/s] 9%|β–‰ | 142/1563 [01:35<14:10, 1.67it/s] 9%|β–‰ | 143/1563 [01:36<14:24, 1.64it/s] 9%|β–‰ | 144/1563 [01:37<14:55, 1.58it/s] 9%|β–‰ | 145/1563 [01:37<13:48, 1.71it/s] 9%|β–‰ | 146/1563 [01:38<14:15, 1.66it/s] 9%|β–‰ | 147/1563 [01:39<15:53, 1.48it/s] 9%|β–‰ | 148/1563 [01:39<16:32, 1.43it/s] 10%|β–‰ | 149/1563 [01:40<16:38, 1.42it/s] 10%|β–‰ | 150/1563 [01:41<17:39, 1.33it/s] {'loss': 0.2407, 'grad_norm': 35.75, 'learning_rate': 1.8093410108765196e-05, 'epoch': 0.1}
10%|β–‰ | 150/1563 [01:41<17:39, 1.33it/s] 10%|β–‰ | 151/1563 [01:41<15:54, 1.48it/s] 10%|β–‰ | 152/1563 [01:42<14:24, 1.63it/s] 10%|β–‰ | 153/1563 [01:43<16:06, 1.46it/s] 10%|β–‰ | 154/1563 [01:44<17:15, 1.36it/s] 10%|β–‰ | 155/1563 [01:44<18:07, 1.29it/s] 10%|β–‰ | 156/1563 [01:45<17:36, 1.33it/s] 10%|β–ˆ | 157/1563 [01:46<15:41, 1.49it/s] 10%|β–ˆ | 158/1563 [01:46<14:06, 1.66it/s] 10%|β–ˆ | 159/1563 [01:47<14:46, 1.58it/s] 10%|β–ˆ | 160/1563 [01:47<14:27, 1.62it/s] 10%|β–ˆ | 161/1563 [01:48<15:17, 1.53it/s] 10%|β–ˆ | 162/1563 [01:49<16:40, 1.40it/s] 10%|β–ˆ | 163/1563 [01:49<14:47, 1.58it/s] 10%|β–ˆ | 164/1563 [01:50<16:13, 1.44it/s] 11%|β–ˆ | 165/1563 [01:51<15:58, 1.46it/s] 11%|β–ˆ | 166/1563 [01:51<14:34, 1.60it/s] 11%|β–ˆ | 167/1563 [01:52<14:04, 1.65it/s] 11%|β–ˆ | 168/1563 [01:53<15:45, 1.47it/s] 11%|β–ˆ | 169/1563 [01:53<15:24, 1.51it/s] 11%|β–ˆ | 170/1563 [01:54<15:21, 1.51it/s] 11%|β–ˆ | 171/1563 [01:55<15:33, 1.49it/s] 11%|β–ˆ | 172/1563 [01:55<14:48, 1.57it/s] 11%|β–ˆ | 173/1563 [01:56<15:25, 1.50it/s] 11%|β–ˆ | 174/1563 [01:57<16:04, 1.44it/s] 11%|β–ˆ | 175/1563 [01:58<16:10, 1.43it/s] 11%|β–ˆβ– | 176/1563 [01:58<16:49, 1.37it/s] 11%|β–ˆβ– | 177/1563 [01:59<15:18, 1.51it/s] 11%|β–ˆβ– | 178/1563 [01:59<15:08, 1.52it/s] 11%|β–ˆβ– | 179/1563 [02:00<14:00, 1.65it/s] 12%|β–ˆβ– | 180/1563 [02:01<15:37, 1.48it/s] 12%|β–ˆβ– | 181/1563 [02:02<16:46, 1.37it/s] 12%|β–ˆβ– | 182/1563 [02:02<15:15, 1.51it/s] 12%|β–ˆβ– | 183/1563 [02:03<14:02, 1.64it/s] 12%|β–ˆβ– | 184/1563 [02:03<14:31, 1.58it/s] 12%|β–ˆβ– | 185/1563 [02:04<15:28, 1.48it/s] 12%|β–ˆβ– | 186/1563 [02:05<14:19, 1.60it/s] 12%|β–ˆβ– | 187/1563 [02:05<15:10, 1.51it/s] 12%|β–ˆβ– | 188/1563 [02:06<16:38, 1.38it/s] 12%|β–ˆβ– | 189/1563 [02:07<17:33, 1.30it/s] 12%|β–ˆβ– | 190/1563 [02:08<17:05, 1.34it/s] 12%|β–ˆβ– | 191/1563 [02:08<16:31, 1.38it/s] 12%|β–ˆβ– | 192/1563 [02:09<15:33, 1.47it/s] 12%|β–ˆβ– | 193/1563 [02:10<16:42, 1.37it/s] 12%|β–ˆβ– | 194/1563 [02:10<14:28, 1.58it/s] 12%|β–ˆβ– | 195/1563 [02:11<14:44, 1.55it/s] 13%|β–ˆβ–Ž | 196/1563 [02:12<15:35, 1.46it/s] 13%|β–ˆβ–Ž | 197/1563 [02:13<16:33, 1.38it/s] 13%|β–ˆβ–Ž | 198/1563 [02:13<16:19, 1.39it/s] 13%|β–ˆβ–Ž | 199/1563 [02:14<15:22, 1.48it/s] 13%|β–ˆβ–Ž | 200/1563 [02:15<15:31, 1.46it/s] {'loss': 0.2004, 'grad_norm': 15.75, 'learning_rate': 1.7453614843250163e-05, 'epoch': 0.13}
13%|β–ˆβ–Ž | 200/1563 [02:15<15:31, 1.46it/s] 13%|β–ˆβ–Ž | 201/1563 [02:15<15:57, 1.42it/s] 13%|β–ˆβ–Ž | 202/1563 [02:16<13:55, 1.63it/s] 13%|β–ˆβ–Ž | 203/1563 [02:17<15:24, 1.47it/s] 13%|β–ˆβ–Ž | 204/1563 [02:17<15:31, 1.46it/s] 13%|β–ˆβ–Ž | 205/1563 [02:18<14:52, 1.52it/s] 13%|β–ˆβ–Ž | 206/1563 [02:19<16:07, 1.40it/s] 13%|β–ˆβ–Ž | 207/1563 [02:19<15:53, 1.42it/s] 13%|β–ˆβ–Ž | 208/1563 [02:20<14:59, 1.51it/s] 13%|β–ˆβ–Ž | 209/1563 [02:21<16:04, 1.40it/s] 13%|β–ˆβ–Ž | 210/1563 [02:22<16:47, 1.34it/s] 13%|β–ˆβ–Ž | 211/1563 [02:22<15:35, 1.44it/s] 14%|β–ˆβ–Ž | 212/1563 [02:23<14:01, 1.60it/s] 14%|β–ˆβ–Ž | 213/1563 [02:23<12:59, 1.73it/s] 14%|β–ˆβ–Ž | 214/1563 [02:24<14:01, 1.60it/s] 14%|β–ˆβ– | 215/1563 [02:25<19:18, 1.16it/s] 14%|β–ˆβ– | 216/1563 [02:26<16:14, 1.38it/s] 14%|β–ˆβ– | 217/1563 [02:26<14:18, 1.57it/s] 14%|β–ˆβ– | 218/1563 [02:26<12:54, 1.74it/s] 14%|β–ˆβ– | 219/1563 [02:27<13:47, 1.62it/s] 14%|β–ˆβ– | 220/1563 [02:28<16:03, 1.39it/s] 14%|β–ˆβ– | 221/1563 [02:29<14:43, 1.52it/s] 14%|β–ˆβ– | 222/1563 [02:30<16:01, 1.39it/s] 14%|β–ˆβ– | 223/1563 [02:30<14:35, 1.53it/s] 14%|β–ˆβ– | 224/1563 [02:31<15:52, 1.41it/s] 14%|β–ˆβ– | 225/1563 [02:32<15:35, 1.43it/s] 14%|β–ˆβ– | 226/1563 [02:32<14:46, 1.51it/s] 15%|β–ˆβ– | 227/1563 [02:33<14:25, 1.54it/s] 15%|β–ˆβ– | 228/1563 [02:33<14:36, 1.52it/s] 15%|β–ˆβ– | 229/1563 [02:34<15:21, 1.45it/s] 15%|β–ˆβ– | 230/1563 [02:35<15:52, 1.40it/s] 15%|β–ˆβ– | 231/1563 [02:36<16:19, 1.36it/s] 15%|β–ˆβ– | 232/1563 [02:36<15:32, 1.43it/s] 15%|β–ˆβ– | 233/1563 [02:37<16:35, 1.34it/s] 15%|β–ˆβ– | 234/1563 [02:38<16:09, 1.37it/s] 15%|β–ˆβ–Œ | 235/1563 [02:39<16:59, 1.30it/s] 15%|β–ˆβ–Œ | 236/1563 [02:39<16:18, 1.36it/s] 15%|β–ˆβ–Œ | 237/1563 [02:40<16:31, 1.34it/s] 15%|β–ˆβ–Œ | 238/1563 [02:41<14:31, 1.52it/s] 15%|β–ˆβ–Œ | 239/1563 [02:41<15:41, 1.41it/s] 15%|β–ˆβ–Œ | 240/1563 [02:42<15:36, 1.41it/s] 15%|β–ˆβ–Œ | 241/1563 [02:43<14:45, 1.49it/s] 15%|β–ˆβ–Œ | 242/1563 [02:43<13:03, 1.69it/s] 16%|β–ˆβ–Œ | 243/1563 [02:44<14:04, 1.56it/s] 16%|β–ˆβ–Œ | 244/1563 [02:45<14:25, 1.52it/s] 16%|β–ˆβ–Œ | 245/1563 [02:45<13:26, 1.63it/s] 16%|β–ˆβ–Œ | 246/1563 [02:46<15:05, 1.45it/s] 16%|β–ˆβ–Œ | 247/1563 [02:47<15:54, 1.38it/s] 16%|β–ˆβ–Œ | 248/1563 [02:48<15:49, 1.38it/s] 16%|β–ˆβ–Œ | 249/1563 [02:48<13:53, 1.58it/s] 16%|β–ˆβ–Œ | 250/1563 [02:49<13:57, 1.57it/s] {'loss': 0.1715, 'grad_norm': 46.75, 'learning_rate': 1.6813819577735126e-05, 'epoch': 0.16}
16%|β–ˆβ–Œ | 250/1563 [02:49<13:57, 1.57it/s] 16%|β–ˆβ–Œ | 251/1563 [02:49<12:42, 1.72it/s] 16%|β–ˆβ–Œ | 252/1563 [02:50<12:43, 1.72it/s] 16%|β–ˆβ–Œ | 253/1563 [02:50<14:31, 1.50it/s] 16%|β–ˆβ–‹ | 254/1563 [02:51<13:25, 1.63it/s] 16%|β–ˆβ–‹ | 255/1563 [02:52<13:25, 1.62it/s] 16%|β–ˆβ–‹ | 256/1563 [02:52<13:46, 1.58it/s] 16%|β–ˆβ–‹ | 257/1563 [02:53<15:01, 1.45it/s] 17%|β–ˆβ–‹ | 258/1563 [02:54<13:35, 1.60it/s] 17%|β–ˆβ–‹ | 259/1563 [02:54<12:30, 1.74it/s] 17%|β–ˆβ–‹ | 260/1563 [02:55<12:13, 1.78it/s] 17%|β–ˆβ–‹ | 261/1563 [02:55<13:58, 1.55it/s] 17%|β–ˆβ–‹ | 262/1563 [02:56<12:48, 1.69it/s] 17%|β–ˆβ–‹ | 263/1563 [02:56<11:57, 1.81it/s] 17%|β–ˆβ–‹ | 264/1563 [02:57<11:28, 1.89it/s] 17%|β–ˆβ–‹ | 265/1563 [02:57<11:35, 1.87it/s] 17%|β–ˆβ–‹ | 266/1563 [02:58<12:33, 1.72it/s] 17%|β–ˆβ–‹ | 267/1563 [02:59<13:18, 1.62it/s] 17%|β–ˆβ–‹ | 268/1563 [02:59<13:27, 1.60it/s] 17%|β–ˆβ–‹ | 269/1563 [03:00<14:45, 1.46it/s] 17%|β–ˆβ–‹ | 270/1563 [03:01<13:20, 1.62it/s] 17%|β–ˆβ–‹ | 271/1563 [03:01<14:19, 1.50it/s] 17%|β–ˆβ–‹ | 272/1563 [03:02<14:26, 1.49it/s] 17%|β–ˆβ–‹ | 273/1563 [03:03<14:37, 1.47it/s] 18%|β–ˆβ–Š | 274/1563 [03:03<13:12, 1.63it/s] 18%|β–ˆβ–Š | 275/1563 [03:04<13:28, 1.59it/s] 18%|β–ˆβ–Š | 276/1563 [03:04<12:40, 1.69it/s] 18%|β–ˆβ–Š | 277/1563 [03:05<13:59, 1.53it/s] 18%|β–ˆβ–Š | 278/1563 [03:06<15:20, 1.40it/s] 18%|β–ˆβ–Š | 279/1563 [03:07<15:50, 1.35it/s] 18%|β–ˆβ–Š | 280/1563 [03:08<14:48, 1.44it/s] 18%|β–ˆβ–Š | 281/1563 [03:08<15:05, 1.42it/s] 18%|β–ˆβ–Š | 282/1563 [03:09<14:52, 1.44it/s] 18%|β–ˆβ–Š | 283/1563 [03:10<14:43, 1.45it/s] 18%|β–ˆβ–Š | 284/1563 [03:10<14:45, 1.44it/s] 18%|β–ˆβ–Š | 285/1563 [03:11<15:20, 1.39it/s] 18%|β–ˆβ–Š | 286/1563 [03:12<13:54, 1.53it/s] 18%|β–ˆβ–Š | 287/1563 [03:12<15:09, 1.40it/s] 18%|β–ˆβ–Š | 288/1563 [03:13<15:47, 1.35it/s] 18%|β–ˆβ–Š | 289/1563 [03:14<16:30, 1.29it/s] 19%|β–ˆβ–Š | 290/1563 [03:15<14:38, 1.45it/s] 19%|β–ˆβ–Š | 291/1563 [03:15<14:59, 1.41it/s] 19%|β–ˆβ–Š | 292/1563 [03:16<15:28, 1.37it/s] 19%|β–ˆβ–Š | 293/1563 [03:17<16:18, 1.30it/s] 19%|β–ˆβ–‰ | 294/1563 [03:18<16:46, 1.26it/s] 19%|β–ˆβ–‰ | 295/1563 [03:19<17:08, 1.23it/s] 19%|β–ˆβ–‰ | 296/1563 [03:19<16:56, 1.25it/s] 19%|β–ˆβ–‰ | 297/1563 [03:20<17:15, 1.22it/s] 19%|β–ˆβ–‰ | 298/1563 [03:21<16:55, 1.25it/s] 19%|β–ˆβ–‰ | 299/1563 [03:22<16:52, 1.25it/s] 19%|β–ˆβ–‰ | 300/1563 [03:23<16:56, 1.24it/s] {'loss': 0.1797, 'grad_norm': 426.0, 'learning_rate': 1.6174024312220092e-05, 'epoch': 0.19}
19%|β–ˆβ–‰ | 300/1563 [03:23<16:56, 1.24it/s] 19%|β–ˆβ–‰ | 301/1563 [03:23<16:20, 1.29it/s] 19%|β–ˆβ–‰ | 302/1563 [03:24<14:23, 1.46it/s] 19%|β–ˆβ–‰ | 303/1563 [03:24<13:27, 1.56it/s] 19%|β–ˆβ–‰ | 304/1563 [03:25<14:08, 1.48it/s] 20%|β–ˆβ–‰ | 305/1563 [03:26<14:31, 1.44it/s] 20%|β–ˆβ–‰ | 306/1563 [03:26<13:54, 1.51it/s] 20%|β–ˆβ–‰ | 307/1563 [03:27<14:57, 1.40it/s] 20%|β–ˆβ–‰ | 308/1563 [03:28<13:29, 1.55it/s] 20%|β–ˆβ–‰ | 309/1563 [03:29<14:32, 1.44it/s] 20%|β–ˆβ–‰ | 310/1563 [03:29<13:09, 1.59it/s] 20%|β–ˆβ–‰ | 311/1563 [03:30<12:50, 1.63it/s] 20%|β–ˆβ–‰ | 312/1563 [03:31<14:15, 1.46it/s] 20%|β–ˆβ–ˆ | 313/1563 [03:31<13:35, 1.53it/s] 20%|β–ˆβ–ˆ | 314/1563 [03:32<14:42, 1.42it/s] 20%|β–ˆβ–ˆ | 315/1563 [03:33<13:56, 1.49it/s] 20%|β–ˆβ–ˆ | 316/1563 [03:33<12:51, 1.62it/s] 20%|β–ˆβ–ˆ | 317/1563 [03:34<13:17, 1.56it/s] 20%|β–ˆβ–ˆ | 318/1563 [03:34<12:32, 1.65it/s] 20%|β–ˆβ–ˆ | 319/1563 [03:35<12:54, 1.61it/s] 20%|β–ˆβ–ˆ | 320/1563 [03:36<14:20, 1.45it/s] 21%|β–ˆβ–ˆ | 321/1563 [03:36<14:23, 1.44it/s] 21%|β–ˆβ–ˆ | 322/1563 [03:37<15:01, 1.38it/s] 21%|β–ˆβ–ˆ | 323/1563 [03:38<14:33, 1.42it/s] 21%|β–ˆβ–ˆ | 324/1563 [03:39<14:38, 1.41it/s] 21%|β–ˆβ–ˆ | 325/1563 [03:39<14:12, 1.45it/s] 21%|β–ˆβ–ˆ | 326/1563 [03:40<14:42, 1.40it/s] 21%|β–ˆβ–ˆ | 327/1563 [03:41<15:25, 1.34it/s] 21%|β–ˆβ–ˆ | 328/1563 [03:41<13:30, 1.52it/s] 21%|β–ˆβ–ˆ | 329/1563 [03:42<14:33, 1.41it/s] 21%|β–ˆβ–ˆ | 330/1563 [03:43<13:58, 1.47it/s] 21%|β–ˆβ–ˆ | 331/1563 [03:43<12:31, 1.64it/s] 21%|β–ˆβ–ˆ | 332/1563 [03:44<11:24, 1.80it/s] 21%|β–ˆβ–ˆβ– | 333/1563 [03:44<12:48, 1.60it/s] 21%|β–ˆβ–ˆβ– | 334/1563 [03:45<14:10, 1.44it/s] 21%|β–ˆβ–ˆβ– | 335/1563 [03:46<12:57, 1.58it/s] 21%|β–ˆβ–ˆβ– | 336/1563 [03:47<14:23, 1.42it/s] 22%|β–ˆβ–ˆβ– | 337/1563 [03:47<13:23, 1.53it/s] 22%|β–ˆβ–ˆβ– | 338/1563 [03:48<14:32, 1.40it/s] 22%|β–ˆβ–ˆβ– | 339/1563 [03:49<14:30, 1.41it/s] 22%|β–ˆβ–ˆβ– | 340/1563 [03:49<13:30, 1.51it/s] 22%|β–ˆβ–ˆβ– | 341/1563 [03:50<14:45, 1.38it/s] 22%|β–ˆβ–ˆβ– | 342/1563 [03:51<15:33, 1.31it/s] 22%|β–ˆβ–ˆβ– | 343/1563 [03:52<15:00, 1.35it/s] 22%|β–ˆβ–ˆβ– | 344/1563 [03:52<13:22, 1.52it/s] 22%|β–ˆβ–ˆβ– | 345/1563 [03:53<14:24, 1.41it/s] 22%|β–ˆβ–ˆβ– | 346/1563 [03:54<14:29, 1.40it/s] 22%|β–ˆβ–ˆβ– | 347/1563 [03:55<15:21, 1.32it/s] 22%|β–ˆβ–ˆβ– | 348/1563 [03:55<15:05, 1.34it/s] 22%|β–ˆβ–ˆβ– | 349/1563 [03:56<15:40, 1.29it/s] 22%|β–ˆβ–ˆβ– | 350/1563 [03:57<15:11, 1.33it/s] {'loss': 0.1763, 'grad_norm': 15.5, 'learning_rate': 1.5534229046705055e-05, 'epoch': 0.22}
22%|β–ˆβ–ˆβ– | 350/1563 [03:57<15:11, 1.33it/s] 22%|β–ˆβ–ˆβ– | 351/1563 [03:58<15:22, 1.31it/s] 23%|β–ˆβ–ˆβ–Ž | 352/1563 [03:58<15:50, 1.27it/s] 23%|β–ˆβ–ˆβ–Ž | 353/1563 [03:59<16:13, 1.24it/s] 23%|β–ˆβ–ˆβ–Ž | 354/1563 [04:00<16:26, 1.23it/s] 23%|β–ˆβ–ˆβ–Ž | 355/1563 [04:01<16:10, 1.24it/s] 23%|β–ˆβ–ˆβ–Ž | 356/1563 [04:02<15:51, 1.27it/s] 23%|β–ˆβ–ˆβ–Ž | 357/1563 [04:02<14:15, 1.41it/s] 23%|β–ˆβ–ˆβ–Ž | 358/1563 [04:03<15:13, 1.32it/s] 23%|β–ˆβ–ˆβ–Ž | 359/1563 [04:04<14:31, 1.38it/s] 23%|β–ˆβ–ˆβ–Ž | 360/1563 [04:05<15:00, 1.34it/s] 23%|β–ˆβ–ˆβ–Ž | 361/1563 [04:05<15:14, 1.31it/s] 23%|β–ˆβ–ˆβ–Ž | 362/1563 [04:06<15:42, 1.27it/s] 23%|β–ˆβ–ˆβ–Ž | 363/1563 [04:07<14:53, 1.34it/s] 23%|β–ˆβ–ˆβ–Ž | 364/1563 [04:07<13:19, 1.50it/s] 23%|β–ˆβ–ˆβ–Ž | 365/1563 [04:08<12:23, 1.61it/s] 23%|β–ˆβ–ˆβ–Ž | 366/1563 [04:09<13:18, 1.50it/s] 23%|β–ˆβ–ˆβ–Ž | 367/1563 [04:09<13:40, 1.46it/s] 24%|β–ˆβ–ˆβ–Ž | 368/1563 [04:10<14:45, 1.35it/s] 24%|β–ˆβ–ˆβ–Ž | 369/1563 [04:11<13:10, 1.51it/s] 24%|β–ˆβ–ˆβ–Ž | 370/1563 [04:11<11:57, 1.66it/s] 24%|β–ˆβ–ˆβ–Ž | 371/1563 [04:12<10:57, 1.81it/s] 24%|β–ˆβ–ˆβ– | 372/1563 [04:12<12:48, 1.55it/s] 24%|β–ˆβ–ˆβ– | 373/1563 [04:13<11:45, 1.69it/s] 24%|β–ˆβ–ˆβ– | 374/1563 [04:14<13:16, 1.49it/s] 24%|β–ˆβ–ˆβ– | 375/1563 [04:15<14:10, 1.40it/s] 24%|β–ˆβ–ˆβ– | 376/1563 [04:15<12:36, 1.57it/s] 24%|β–ˆβ–ˆβ– | 377/1563 [04:15<11:27, 1.73it/s] 24%|β–ˆβ–ˆβ– | 378/1563 [04:16<10:35, 1.87it/s] 24%|β–ˆβ–ˆβ– | 379/1563 [04:17<11:48, 1.67it/s] 24%|β–ˆβ–ˆβ– | 380/1563 [04:17<12:16, 1.61it/s] 24%|β–ˆβ–ˆβ– | 381/1563 [04:18<13:34, 1.45it/s] 24%|β–ˆβ–ˆβ– | 382/1563 [04:19<14:14, 1.38it/s] 25%|β–ˆβ–ˆβ– | 383/1563 [04:20<14:51, 1.32it/s] 25%|β–ˆβ–ˆβ– | 384/1563 [04:20<13:13, 1.49it/s] 25%|β–ˆβ–ˆβ– | 385/1563 [04:21<13:22, 1.47it/s] 25%|β–ˆβ–ˆβ– | 386/1563 [04:22<12:39, 1.55it/s] 25%|β–ˆβ–ˆβ– | 387/1563 [04:22<12:14, 1.60it/s] 25%|β–ˆβ–ˆβ– | 388/1563 [04:23<13:11, 1.48it/s] 25%|β–ˆβ–ˆβ– | 389/1563 [04:23<12:14, 1.60it/s] 25%|β–ˆβ–ˆβ– | 390/1563 [04:24<12:42, 1.54it/s] 25%|β–ˆβ–ˆβ–Œ | 391/1563 [04:25<11:47, 1.66it/s] 25%|β–ˆβ–ˆβ–Œ | 392/1563 [04:25<13:14, 1.47it/s] 25%|β–ˆβ–ˆβ–Œ | 393/1563 [04:26<14:07, 1.38it/s] 25%|β–ˆβ–ˆβ–Œ | 394/1563 [04:27<14:41, 1.33it/s] 25%|β–ˆβ–ˆβ–Œ | 395/1563 [04:28<13:05, 1.49it/s] 25%|β–ˆβ–ˆβ–Œ | 396/1563 [04:28<13:10, 1.48it/s] 25%|β–ˆβ–ˆβ–Œ | 397/1563 [04:29<12:32, 1.55it/s] 25%|β–ˆβ–ˆβ–Œ | 398/1563 [04:29<11:35, 1.67it/s] 26%|β–ˆβ–ˆβ–Œ | 399/1563 [04:30<13:00, 1.49it/s] 26%|β–ˆβ–ˆβ–Œ | 400/1563 [04:31<12:21, 1.57it/s] {'loss': 0.161, 'grad_norm': 11.1875, 'learning_rate': 1.4894433781190021e-05, 'epoch': 0.26}
26%|β–ˆβ–ˆβ–Œ | 400/1563 [04:31<12:21, 1.57it/s] 26%|β–ˆβ–ˆβ–Œ | 401/1563 [04:32<13:32, 1.43it/s] 26%|β–ˆβ–ˆβ–Œ | 402/1563 [04:32<12:17, 1.57it/s] 26%|β–ˆβ–ˆβ–Œ | 403/1563 [04:33<12:32, 1.54it/s] 26%|β–ˆβ–ˆβ–Œ | 404/1563 [04:33<11:44, 1.65it/s] 26%|β–ˆβ–ˆβ–Œ | 405/1563 [04:34<12:05, 1.60it/s] 26%|β–ˆβ–ˆβ–Œ | 406/1563 [04:35<13:24, 1.44it/s] 26%|β–ˆβ–ˆβ–Œ | 407/1563 [04:36<14:06, 1.37it/s] 26%|β–ˆβ–ˆβ–Œ | 408/1563 [04:36<14:51, 1.30it/s] 26%|β–ˆβ–ˆβ–Œ | 409/1563 [04:37<13:27, 1.43it/s] 26%|β–ˆβ–ˆβ–Œ | 410/1563 [04:38<14:16, 1.35it/s] 26%|β–ˆβ–ˆβ–‹ | 411/1563 [04:38<12:56, 1.48it/s] 26%|β–ˆβ–ˆβ–‹ | 412/1563 [04:39<11:49, 1.62it/s] 26%|β–ˆβ–ˆβ–‹ | 413/1563 [04:40<12:49, 1.50it/s] 26%|β–ˆβ–ˆβ–‹ | 414/1563 [04:40<11:28, 1.67it/s] 27%|β–ˆβ–ˆβ–‹ | 415/1563 [04:41<11:20, 1.69it/s] 27%|β–ˆβ–ˆβ–‹ | 416/1563 [04:41<10:38, 1.80it/s] 27%|β–ˆβ–ˆβ–‹ | 417/1563 [04:42<12:18, 1.55it/s] 27%|β–ˆβ–ˆβ–‹ | 418/1563 [04:43<13:24, 1.42it/s] 27%|β–ˆβ–ˆβ–‹ | 419/1563 [04:44<14:15, 1.34it/s] 27%|β–ˆβ–ˆβ–‹ | 420/1563 [04:44<14:23, 1.32it/s] 27%|β–ˆβ–ˆβ–‹ | 421/1563 [04:45<13:54, 1.37it/s] 27%|β–ˆβ–ˆβ–‹ | 422/1563 [04:46<14:35, 1.30it/s] 27%|β–ˆβ–ˆβ–‹ | 423/1563 [04:46<12:49, 1.48it/s] 27%|β–ˆβ–ˆβ–‹ | 424/1563 [04:47<11:53, 1.60it/s] 27%|β–ˆβ–ˆβ–‹ | 425/1563 [04:48<13:07, 1.44it/s] 27%|β–ˆβ–ˆβ–‹ | 426/1563 [04:49<14:02, 1.35it/s] 27%|β–ˆβ–ˆβ–‹ | 427/1563 [04:49<12:18, 1.54it/s] 27%|β–ˆβ–ˆβ–‹ | 428/1563 [04:50<13:15, 1.43it/s] 27%|β–ˆβ–ˆβ–‹ | 429/1563 [04:51<13:03, 1.45it/s] 28%|β–ˆβ–ˆβ–Š | 430/1563 [04:51<13:23, 1.41it/s] 28%|β–ˆβ–ˆβ–Š | 431/1563 [04:52<12:00, 1.57it/s] 28%|β–ˆβ–ˆβ–Š | 432/1563 [04:52<10:46, 1.75it/s] 28%|β–ˆβ–ˆβ–Š | 433/1563 [04:53<10:25, 1.81it/s] 28%|β–ˆβ–ˆβ–Š | 434/1563 [04:53<10:12, 1.84it/s] 28%|β–ˆβ–ˆβ–Š | 435/1563 [04:54<10:20, 1.82it/s] 28%|β–ˆβ–ˆβ–Š | 436/1563 [04:55<11:52, 1.58it/s] 28%|β–ˆβ–ˆβ–Š | 437/1563 [04:55<11:02, 1.70it/s] 28%|β–ˆβ–ˆβ–Š | 438/1563 [04:56<11:45, 1.60it/s] 28%|β–ˆβ–ˆβ–Š | 439/1563 [04:56<11:15, 1.66it/s] 28%|β–ˆβ–ˆβ–Š | 440/1563 [04:57<10:50, 1.73it/s] 28%|β–ˆβ–ˆβ–Š | 441/1563 [04:57<10:17, 1.82it/s] 28%|β–ˆβ–ˆβ–Š | 442/1563 [04:58<09:41, 1.93it/s] 28%|β–ˆβ–ˆβ–Š | 443/1563 [04:58<09:28, 1.97it/s] 28%|β–ˆβ–ˆβ–Š | 444/1563 [04:59<09:49, 1.90it/s] 28%|β–ˆβ–ˆβ–Š | 445/1563 [05:00<11:30, 1.62it/s] 29%|β–ˆβ–ˆβ–Š | 446/1563 [05:00<12:09, 1.53it/s] 29%|β–ˆβ–ˆβ–Š | 447/1563 [05:01<12:46, 1.46it/s] 29%|β–ˆβ–ˆβ–Š | 448/1563 [05:02<12:56, 1.44it/s] 29%|β–ˆβ–ˆβ–Š | 449/1563 [05:02<12:07, 1.53it/s] 29%|β–ˆβ–ˆβ–‰ | 450/1563 [05:03<12:29, 1.48it/s] {'loss': 0.1574, 'grad_norm': 10.0, 'learning_rate': 1.4254638515674986e-05, 'epoch': 0.29}
29%|β–ˆβ–ˆβ–‰ | 450/1563 [05:03<12:29, 1.48it/s] 29%|β–ˆβ–ˆβ–‰ | 451/1563 [05:04<11:21, 1.63it/s] 29%|β–ˆβ–ˆβ–‰ | 452/1563 [05:04<11:50, 1.56it/s] 29%|β–ˆβ–ˆβ–‰ | 453/1563 [05:05<10:48, 1.71it/s] 29%|β–ˆβ–ˆβ–‰ | 454/1563 [05:05<11:01, 1.68it/s] 29%|β–ˆβ–ˆβ–‰ | 455/1563 [05:06<10:35, 1.74it/s] 29%|β–ˆβ–ˆβ–‰ | 456/1563 [05:06<10:06, 1.83it/s] 29%|β–ˆβ–ˆβ–‰ | 457/1563 [05:07<09:33, 1.93it/s] 29%|β–ˆβ–ˆβ–‰ | 458/1563 [05:07<09:13, 2.00it/s] 29%|β–ˆβ–ˆβ–‰ | 459/1563 [05:08<11:04, 1.66it/s] 29%|β–ˆβ–ˆβ–‰ | 460/1563 [05:09<12:27, 1.48it/s] 29%|β–ˆβ–ˆβ–‰ | 461/1563 [05:09<11:10, 1.64it/s] 30%|β–ˆβ–ˆβ–‰ | 462/1563 [05:10<12:31, 1.46it/s] 30%|β–ˆβ–ˆβ–‰ | 463/1563 [05:11<11:06, 1.65it/s] 30%|β–ˆβ–ˆβ–‰ | 464/1563 [05:11<10:13, 1.79it/s] 30%|β–ˆβ–ˆβ–‰ | 465/1563 [05:12<11:23, 1.61it/s] 30%|β–ˆβ–ˆβ–‰ | 466/1563 [05:13<12:42, 1.44it/s] 30%|β–ˆβ–ˆβ–‰ | 467/1563 [05:13<11:45, 1.55it/s] 30%|β–ˆβ–ˆβ–‰ | 468/1563 [05:14<10:42, 1.70it/s] 30%|β–ˆβ–ˆβ–ˆ | 469/1563 [05:15<12:05, 1.51it/s] 30%|β–ˆβ–ˆβ–ˆ | 470/1563 [05:15<11:09, 1.63it/s] 30%|β–ˆβ–ˆβ–ˆ | 471/1563 [05:16<10:37, 1.71it/s] 30%|β–ˆβ–ˆβ–ˆ | 472/1563 [05:16<09:57, 1.83it/s] 30%|β–ˆβ–ˆβ–ˆ | 473/1563 [05:17<11:34, 1.57it/s] 30%|β–ˆβ–ˆβ–ˆ | 474/1563 [05:17<10:22, 1.75it/s] 30%|β–ˆβ–ˆβ–ˆ | 475/1563 [05:18<10:05, 1.80it/s] 30%|β–ˆβ–ˆβ–ˆ | 476/1563 [05:18<09:17, 1.95it/s] 31%|β–ˆβ–ˆβ–ˆ | 477/1563 [05:19<09:16, 1.95it/s] 31%|β–ˆβ–ˆβ–ˆ | 478/1563 [05:19<08:50, 2.05it/s] 31%|β–ˆβ–ˆβ–ˆ | 479/1563 [05:20<09:21, 1.93it/s] 31%|β–ˆβ–ˆβ–ˆ | 480/1563 [05:21<11:09, 1.62it/s] 31%|β–ˆβ–ˆβ–ˆ | 481/1563 [05:21<11:53, 1.52it/s] 31%|β–ˆβ–ˆβ–ˆ | 482/1563 [05:22<10:46, 1.67it/s] 31%|β–ˆβ–ˆβ–ˆ | 483/1563 [05:23<11:17, 1.59it/s] 31%|β–ˆβ–ˆβ–ˆ | 484/1563 [05:23<10:59, 1.64it/s] 31%|β–ˆβ–ˆβ–ˆ | 485/1563 [05:24<10:42, 1.68it/s] 31%|β–ˆβ–ˆβ–ˆ | 486/1563 [05:24<09:43, 1.84it/s] 31%|β–ˆβ–ˆβ–ˆ | 487/1563 [05:25<11:05, 1.62it/s] 31%|β–ˆβ–ˆβ–ˆ | 488/1563 [05:26<11:55, 1.50it/s] 31%|β–ˆβ–ˆβ–ˆβ– | 489/1563 [05:27<12:38, 1.42it/s] 31%|β–ˆβ–ˆβ–ˆβ– | 490/1563 [05:27<13:14, 1.35it/s] 31%|β–ˆβ–ˆβ–ˆβ– | 491/1563 [05:28<11:58, 1.49it/s] 31%|β–ˆβ–ˆβ–ˆβ– | 492/1563 [05:28<11:14, 1.59it/s] 32%|β–ˆβ–ˆβ–ˆβ– | 493/1563 [05:29<10:24, 1.71it/s] 32%|β–ˆβ–ˆβ–ˆβ– | 494/1563 [05:30<11:29, 1.55it/s] 32%|β–ˆβ–ˆβ–ˆβ– | 495/1563 [05:30<10:40, 1.67it/s] 32%|β–ˆβ–ˆβ–ˆβ– | 496/1563 [05:31<11:57, 1.49it/s] 32%|β–ˆβ–ˆβ–ˆβ– | 497/1563 [05:31<10:45, 1.65it/s] 32%|β–ˆβ–ˆβ–ˆβ– | 498/1563 [05:32<10:50, 1.64it/s] 32%|β–ˆβ–ˆβ–ˆβ– | 499/1563 [05:33<11:16, 1.57it/s] 32%|β–ˆβ–ˆβ–ˆβ– | 500/1563 [05:34<12:29, 1.42it/s] {'loss': 0.15, 'grad_norm': 14.0625, 'learning_rate': 1.361484325015995e-05, 'epoch': 0.32}
32%|β–ˆβ–ˆβ–ˆβ– | 500/1563 [05:34<12:29, 1.42it/s] 32%|β–ˆβ–ˆβ–ˆβ– | 501/1563 [05:34<12:59, 1.36it/s] 32%|β–ˆβ–ˆβ–ˆβ– | 502/1563 [05:35<11:37, 1.52it/s] 32%|β–ˆβ–ˆβ–ˆβ– | 503/1563 [05:35<10:59, 1.61it/s] 32%|β–ˆβ–ˆβ–ˆβ– | 504/1563 [05:36<11:44, 1.50it/s] 32%|β–ˆβ–ˆβ–ˆβ– | 505/1563 [05:37<12:49, 1.38it/s] 32%|β–ˆβ–ˆβ–ˆβ– | 506/1563 [05:38<13:18, 1.32it/s] 32%|β–ˆβ–ˆβ–ˆβ– | 507/1563 [05:39<13:42, 1.28it/s] 33%|β–ˆβ–ˆβ–ˆβ–Ž | 508/1563 [05:40<13:45, 1.28it/s] 33%|β–ˆβ–ˆβ–ˆβ–Ž | 509/1563 [05:40<14:00, 1.25it/s] 33%|β–ˆβ–ˆβ–ˆβ–Ž | 510/1563 [05:41<12:55, 1.36it/s] 33%|β–ˆβ–ˆβ–ˆβ–Ž | 511/1563 [05:42<11:46, 1.49it/s] 33%|β–ˆβ–ˆβ–ˆβ–Ž | 512/1563 [05:42<10:55, 1.60it/s] 33%|β–ˆβ–ˆβ–ˆβ–Ž | 513/1563 [05:43<10:57, 1.60it/s] 33%|β–ˆβ–ˆβ–ˆβ–Ž | 514/1563 [05:43<10:52, 1.61it/s] 33%|β–ˆβ–ˆβ–ˆβ–Ž | 515/1563 [05:44<10:08, 1.72it/s] 33%|β–ˆβ–ˆβ–ˆβ–Ž | 516/1563 [05:44<09:46, 1.79it/s] 33%|β–ˆβ–ˆβ–ˆβ–Ž | 517/1563 [05:45<10:17, 1.70it/s] 33%|β–ˆβ–ˆβ–ˆβ–Ž | 518/1563 [05:45<10:06, 1.72it/s] 33%|β–ˆβ–ˆβ–ˆβ–Ž | 519/1563 [05:46<10:19, 1.69it/s] 33%|β–ˆβ–ˆβ–ˆβ–Ž | 520/1563 [05:47<10:16, 1.69it/s] 33%|β–ˆβ–ˆβ–ˆβ–Ž | 521/1563 [05:47<11:05, 1.56it/s] 33%|β–ˆβ–ˆβ–ˆβ–Ž | 522/1563 [05:48<11:32, 1.50it/s] 33%|β–ˆβ–ˆβ–ˆβ–Ž | 523/1563 [05:49<12:23, 1.40it/s] 34%|β–ˆβ–ˆβ–ˆβ–Ž | 524/1563 [05:50<12:38, 1.37it/s] 34%|β–ˆβ–ˆβ–ˆβ–Ž | 525/1563 [05:50<11:23, 1.52it/s] 34%|β–ˆβ–ˆβ–ˆβ–Ž | 526/1563 [05:51<10:31, 1.64it/s] 34%|β–ˆβ–ˆβ–ˆβ–Ž | 527/1563 [05:51<09:47, 1.76it/s] 34%|β–ˆβ–ˆβ–ˆβ– | 528/1563 [05:52<09:32, 1.81it/s] 34%|β–ˆβ–ˆβ–ˆβ– | 529/1563 [05:53<10:46, 1.60it/s] 34%|β–ˆβ–ˆβ–ˆβ– | 530/1563 [05:53<11:03, 1.56it/s] 34%|β–ˆβ–ˆβ–ˆβ– | 531/1563 [05:54<11:41, 1.47it/s] 34%|β–ˆβ–ˆβ–ˆβ– | 532/1563 [05:55<12:23, 1.39it/s] 34%|β–ˆβ–ˆβ–ˆβ– | 533/1563 [05:55<11:39, 1.47it/s] 34%|β–ˆβ–ˆβ–ˆβ– | 534/1563 [05:56<10:35, 1.62it/s] 34%|β–ˆβ–ˆβ–ˆβ– | 535/1563 [05:57<11:36, 1.48it/s] 34%|β–ˆβ–ˆβ–ˆβ– | 536/1563 [05:57<10:32, 1.62it/s] 34%|β–ˆβ–ˆβ–ˆβ– | 537/1563 [05:58<11:41, 1.46it/s] 34%|β–ˆβ–ˆβ–ˆβ– | 538/1563 [05:58<10:38, 1.61it/s] 34%|β–ˆβ–ˆβ–ˆβ– | 539/1563 [05:59<10:42, 1.59it/s] 35%|β–ˆβ–ˆβ–ˆβ– | 540/1563 [06:00<11:06, 1.54it/s] 35%|β–ˆβ–ˆβ–ˆβ– | 541/1563 [06:00<10:50, 1.57it/s] 35%|β–ˆβ–ˆβ–ˆβ– | 542/1563 [06:01<10:15, 1.66it/s] 35%|β–ˆβ–ˆβ–ˆβ– | 543/1563 [06:02<11:19, 1.50it/s] 35%|β–ˆβ–ˆβ–ˆβ– | 544/1563 [06:03<11:52, 1.43it/s] 35%|β–ˆβ–ˆβ–ˆβ– | 545/1563 [06:03<12:37, 1.34it/s] 35%|β–ˆβ–ˆβ–ˆβ– | 546/1563 [06:04<13:11, 1.28it/s] 35%|β–ˆβ–ˆβ–ˆβ– | 547/1563 [06:05<12:41, 1.33it/s] 35%|β–ˆβ–ˆβ–ˆβ–Œ | 548/1563 [06:05<11:05, 1.52it/s] 35%|β–ˆβ–ˆβ–ˆβ–Œ | 549/1563 [06:06<11:03, 1.53it/s] 35%|β–ˆβ–ˆβ–ˆβ–Œ | 550/1563 [06:06<09:48, 1.72it/s] {'loss': 0.1867, 'grad_norm': 13.75, 'learning_rate': 1.2975047984644915e-05, 'epoch': 0.35}
35%|β–ˆβ–ˆβ–ˆβ–Œ | 550/1563 [06:06<09:48, 1.72it/s] 35%|β–ˆβ–ˆβ–ˆβ–Œ | 551/1563 [06:07<11:12, 1.50it/s] 35%|β–ˆβ–ˆβ–ˆβ–Œ | 552/1563 [06:08<10:50, 1.55it/s] 35%|β–ˆβ–ˆβ–ˆβ–Œ | 553/1563 [06:09<10:59, 1.53it/s] 35%|β–ˆβ–ˆβ–ˆβ–Œ | 554/1563 [06:09<11:31, 1.46it/s] 36%|β–ˆβ–ˆβ–ˆβ–Œ | 555/1563 [06:10<11:40, 1.44it/s] 36%|β–ˆβ–ˆβ–ˆβ–Œ | 556/1563 [06:10<10:33, 1.59it/s] 36%|β–ˆβ–ˆβ–ˆβ–Œ | 557/1563 [06:11<10:28, 1.60it/s] 36%|β–ˆβ–ˆβ–ˆβ–Œ | 558/1563 [06:12<11:32, 1.45it/s] 36%|β–ˆβ–ˆβ–ˆβ–Œ | 559/1563 [06:13<11:37, 1.44it/s] 36%|β–ˆβ–ˆβ–ˆβ–Œ | 560/1563 [06:14<12:20, 1.35it/s] 36%|β–ˆβ–ˆβ–ˆβ–Œ | 561/1563 [06:14<12:42, 1.31it/s] 36%|β–ˆβ–ˆβ–ˆβ–Œ | 562/1563 [06:15<11:38, 1.43it/s] 36%|β–ˆβ–ˆβ–ˆβ–Œ | 563/1563 [06:16<11:40, 1.43it/s] 36%|β–ˆβ–ˆβ–ˆβ–Œ | 564/1563 [06:16<12:18, 1.35it/s] 36%|β–ˆβ–ˆβ–ˆβ–Œ | 565/1563 [06:17<11:10, 1.49it/s] 36%|β–ˆβ–ˆβ–ˆβ–Œ | 566/1563 [06:18<12:09, 1.37it/s] 36%|β–ˆβ–ˆβ–ˆβ–‹ | 567/1563 [06:19<12:52, 1.29it/s] 36%|β–ˆβ–ˆβ–ˆβ–‹ | 568/1563 [06:19<11:18, 1.47it/s] 36%|β–ˆβ–ˆβ–ˆβ–‹ | 569/1563 [06:20<11:38, 1.42it/s] 36%|β–ˆβ–ˆβ–ˆβ–‹ | 570/1563 [06:20<10:33, 1.57it/s] 37%|β–ˆβ–ˆβ–ˆβ–‹ | 571/1563 [06:21<11:05, 1.49it/s] 37%|β–ˆβ–ˆβ–ˆβ–‹ | 572/1563 [06:22<11:43, 1.41it/s] 37%|β–ˆβ–ˆβ–ˆβ–‹ | 573/1563 [06:22<10:48, 1.53it/s] 37%|β–ˆβ–ˆβ–ˆβ–‹ | 574/1563 [06:23<11:46, 1.40it/s] 37%|β–ˆβ–ˆβ–ˆβ–‹ | 575/1563 [06:24<11:43, 1.40it/s] 37%|β–ˆβ–ˆβ–ˆβ–‹ | 576/1563 [06:25<11:26, 1.44it/s] 37%|β–ˆβ–ˆβ–ˆβ–‹ | 577/1563 [06:25<10:45, 1.53it/s] 37%|β–ˆβ–ˆβ–ˆβ–‹ | 578/1563 [06:26<10:57, 1.50it/s] 37%|β–ˆβ–ˆβ–ˆβ–‹ | 579/1563 [06:27<11:52, 1.38it/s] 37%|β–ˆβ–ˆβ–ˆβ–‹ | 580/1563 [06:28<12:27, 1.31it/s] 37%|β–ˆβ–ˆβ–ˆβ–‹ | 581/1563 [06:28<11:47, 1.39it/s] 37%|β–ˆβ–ˆβ–ˆβ–‹ | 582/1563 [06:29<12:19, 1.33it/s] 37%|β–ˆβ–ˆβ–ˆβ–‹ | 583/1563 [06:30<11:35, 1.41it/s] 37%|β–ˆβ–ˆβ–ˆβ–‹ | 584/1563 [06:30<10:25, 1.56it/s] 37%|β–ˆβ–ˆβ–ˆβ–‹ | 585/1563 [06:31<11:24, 1.43it/s] 37%|β–ˆβ–ˆβ–ˆβ–‹ | 586/1563 [06:32<11:40, 1.39it/s] 38%|β–ˆβ–ˆβ–ˆβ–Š | 587/1563 [06:32<10:20, 1.57it/s] 38%|β–ˆβ–ˆβ–ˆβ–Š | 588/1563 [06:33<10:40, 1.52it/s] 38%|β–ˆβ–ˆβ–ˆβ–Š | 589/1563 [06:34<10:56, 1.48it/s] 38%|β–ˆβ–ˆβ–ˆβ–Š | 590/1563 [06:34<10:28, 1.55it/s] 38%|β–ˆβ–ˆβ–ˆβ–Š | 591/1563 [06:35<09:35, 1.69it/s] 38%|β–ˆβ–ˆβ–ˆβ–Š | 592/1563 [06:35<10:33, 1.53it/s] 38%|β–ˆβ–ˆβ–ˆβ–Š | 593/1563 [06:36<11:26, 1.41it/s] 38%|β–ˆβ–ˆβ–ˆβ–Š | 594/1563 [06:37<11:16, 1.43it/s] 38%|β–ˆβ–ˆβ–ˆβ–Š | 595/1563 [06:37<10:10, 1.59it/s] 38%|β–ˆβ–ˆβ–ˆβ–Š | 596/1563 [06:38<11:18, 1.43it/s] 38%|β–ˆβ–ˆβ–ˆβ–Š | 597/1563 [06:39<11:19, 1.42it/s] 38%|β–ˆβ–ˆβ–ˆβ–Š | 598/1563 [06:40<11:15, 1.43it/s] 38%|β–ˆβ–ˆβ–ˆβ–Š | 599/1563 [06:40<10:47, 1.49it/s] 38%|β–ˆβ–ˆβ–ˆβ–Š | 600/1563 [06:41<10:46, 1.49it/s] {'loss': 0.1441, 'grad_norm': 12.9375, 'learning_rate': 1.233525271912988e-05, 'epoch': 0.38}
38%|β–ˆβ–ˆβ–ˆβ–Š | 600/1563 [06:41<10:46, 1.49it/s] 38%|β–ˆβ–ˆβ–ˆβ–Š | 601/1563 [06:42<10:20, 1.55it/s] 39%|β–ˆβ–ˆβ–ˆβ–Š | 602/1563 [06:42<10:39, 1.50it/s] 39%|β–ˆβ–ˆβ–ˆβ–Š | 603/1563 [06:43<11:30, 1.39it/s] 39%|β–ˆβ–ˆβ–ˆβ–Š | 604/1563 [06:44<10:25, 1.53it/s] 39%|β–ˆβ–ˆβ–ˆβ–Š | 605/1563 [06:44<09:32, 1.67it/s] 39%|β–ˆβ–ˆβ–ˆβ–‰ | 606/1563 [06:45<09:12, 1.73it/s] 39%|β–ˆβ–ˆβ–ˆβ–‰ | 607/1563 [06:45<10:15, 1.55it/s] 39%|β–ˆβ–ˆβ–ˆβ–‰ | 608/1563 [06:46<09:38, 1.65it/s] 39%|β–ˆβ–ˆβ–ˆβ–‰ | 609/1563 [06:46<08:56, 1.78it/s] 39%|β–ˆβ–ˆβ–ˆβ–‰ | 610/1563 [06:47<08:35, 1.85it/s] 39%|β–ˆβ–ˆβ–ˆβ–‰ | 611/1563 [06:47<08:37, 1.84it/s] 39%|β–ˆβ–ˆβ–ˆβ–‰ | 612/1563 [06:48<08:20, 1.90it/s] 39%|β–ˆβ–ˆβ–ˆβ–‰ | 613/1563 [06:49<08:40, 1.83it/s] 39%|β–ˆβ–ˆβ–ˆβ–‰ | 614/1563 [06:49<09:41, 1.63it/s] 39%|β–ˆβ–ˆβ–ˆβ–‰ | 615/1563 [06:50<09:58, 1.58it/s] 39%|β–ˆβ–ˆβ–ˆβ–‰ | 616/1563 [06:51<10:42, 1.47it/s] 39%|β–ˆβ–ˆβ–ˆβ–‰ | 617/1563 [06:51<09:47, 1.61it/s] 40%|β–ˆβ–ˆβ–ˆβ–‰ | 618/1563 [06:52<10:25, 1.51it/s] 40%|β–ˆβ–ˆβ–ˆβ–‰ | 619/1563 [06:53<10:27, 1.51it/s] 40%|β–ˆβ–ˆβ–ˆβ–‰ | 620/1563 [06:53<09:28, 1.66it/s] 40%|β–ˆβ–ˆβ–ˆβ–‰ | 621/1563 [06:54<09:50, 1.59it/s] 40%|β–ˆβ–ˆβ–ˆβ–‰ | 622/1563 [06:55<11:01, 1.42it/s] 40%|β–ˆβ–ˆβ–ˆβ–‰ | 623/1563 [06:55<09:59, 1.57it/s] 40%|β–ˆβ–ˆβ–ˆβ–‰ | 624/1563 [06:56<09:58, 1.57it/s] 40%|β–ˆβ–ˆβ–ˆβ–‰ | 625/1563 [06:57<10:15, 1.53it/s] 40%|β–ˆβ–ˆβ–ˆβ–ˆ | 626/1563 [06:57<10:34, 1.48it/s] 40%|β–ˆβ–ˆβ–ˆβ–ˆ | 627/1563 [06:58<10:25, 1.50it/s] 40%|β–ˆβ–ˆβ–ˆβ–ˆ | 628/1563 [06:59<11:19, 1.38it/s] 40%|β–ˆβ–ˆβ–ˆβ–ˆ | 629/1563 [06:59<10:25, 1.49it/s] 40%|β–ˆβ–ˆβ–ˆβ–ˆ | 630/1563 [07:00<11:17, 1.38it/s] 40%|β–ˆβ–ˆβ–ˆβ–ˆ | 631/1563 [07:01<10:28, 1.48it/s] 40%|β–ˆβ–ˆβ–ˆβ–ˆ | 632/1563 [07:01<09:21, 1.66it/s] 40%|β–ˆβ–ˆβ–ˆβ–ˆ | 633/1563 [07:02<10:06, 1.53it/s] 41%|β–ˆβ–ˆβ–ˆβ–ˆ | 634/1563 [07:02<09:23, 1.65it/s] 41%|β–ˆβ–ˆβ–ˆβ–ˆ | 635/1563 [07:03<09:06, 1.70it/s] 41%|β–ˆβ–ˆβ–ˆβ–ˆ | 636/1563 [07:03<08:36, 1.79it/s] 41%|β–ˆβ–ˆβ–ˆβ–ˆ | 637/1563 [07:04<10:00, 1.54it/s] 41%|β–ˆβ–ˆβ–ˆβ–ˆ | 638/1563 [07:05<10:37, 1.45it/s] 41%|β–ˆβ–ˆβ–ˆβ–ˆ | 639/1563 [07:06<09:51, 1.56it/s] 41%|β–ˆβ–ˆβ–ˆβ–ˆ | 640/1563 [07:06<09:47, 1.57it/s] 41%|β–ˆβ–ˆβ–ˆβ–ˆ | 641/1563 [07:07<09:04, 1.69it/s] 41%|β–ˆβ–ˆβ–ˆβ–ˆ | 642/1563 [07:07<09:35, 1.60it/s] 41%|β–ˆβ–ˆβ–ˆβ–ˆ | 643/1563 [07:08<09:14, 1.66it/s] 41%|β–ˆβ–ˆβ–ˆβ–ˆ | 644/1563 [07:09<09:42, 1.58it/s] 41%|β–ˆβ–ˆβ–ˆβ–ˆβ– | 645/1563 [07:10<10:44, 1.42it/s] 41%|β–ˆβ–ˆβ–ˆβ–ˆβ– | 646/1563 [07:10<09:49, 1.56it/s] 41%|β–ˆβ–ˆβ–ˆβ–ˆβ– | 647/1563 [07:11<09:37, 1.59it/s] 41%|β–ˆβ–ˆβ–ˆβ–ˆβ– | 648/1563 [07:12<10:40, 1.43it/s] 42%|β–ˆβ–ˆβ–ˆβ–ˆβ– | 649/1563 [07:12<11:21, 1.34it/s] 42%|β–ˆβ–ˆβ–ˆβ–ˆβ– | 650/1563 [07:13<10:52, 1.40it/s] {'loss': 0.1872, 'grad_norm': 23.0, 'learning_rate': 1.1695457453614845e-05, 'epoch': 0.42}
42%|β–ˆβ–ˆβ–ˆβ–ˆβ– | 650/1563 [07:13<10:52, 1.40it/s] 42%|β–ˆβ–ˆβ–ˆβ–ˆβ– | 651/1563 [07:14<09:55, 1.53it/s] 42%|β–ˆβ–ˆβ–ˆβ–ˆβ– | 652/1563 [07:14<08:53, 1.71it/s] 42%|β–ˆβ–ˆβ–ˆβ–ˆβ– | 653/1563 [07:15<10:06, 1.50it/s] 42%|β–ˆβ–ˆβ–ˆβ–ˆβ– | 654/1563 [07:15<09:14, 1.64it/s] 42%|β–ˆβ–ˆβ–ˆβ–ˆβ– | 655/1563 [07:16<08:38, 1.75it/s] 42%|β–ˆβ–ˆβ–ˆβ–ˆβ– | 656/1563 [07:17<09:55, 1.52it/s] 42%|β–ˆβ–ˆβ–ˆβ–ˆβ– | 657/1563 [07:17<10:19, 1.46it/s] 42%|β–ˆβ–ˆβ–ˆβ–ˆβ– | 658/1563 [07:18<11:02, 1.37it/s] 42%|β–ˆβ–ˆβ–ˆβ–ˆβ– | 659/1563 [07:19<11:41, 1.29it/s] 42%|β–ˆβ–ˆβ–ˆβ–ˆβ– | 660/1563 [07:20<10:26, 1.44it/s] 42%|β–ˆβ–ˆβ–ˆβ–ˆβ– | 661/1563 [07:20<10:40, 1.41it/s] 42%|β–ˆβ–ˆβ–ˆβ–ˆβ– | 662/1563 [07:21<09:19, 1.61it/s] 42%|β–ˆβ–ˆβ–ˆβ–ˆβ– | 663/1563 [07:21<09:34, 1.57it/s] 42%|β–ˆβ–ˆβ–ˆβ–ˆβ– | 664/1563 [07:22<09:48, 1.53it/s] 43%|β–ˆβ–ˆβ–ˆβ–ˆβ–Ž | 665/1563 [07:23<09:10, 1.63it/s] 43%|β–ˆβ–ˆβ–ˆβ–ˆβ–Ž | 666/1563 [07:23<08:53, 1.68it/s] 43%|β–ˆβ–ˆβ–ˆβ–ˆβ–Ž | 667/1563 [07:24<08:22, 1.78it/s] 43%|β–ˆβ–ˆβ–ˆβ–ˆβ–Ž | 668/1563 [07:24<09:24, 1.59it/s] 43%|β–ˆβ–ˆβ–ˆβ–ˆβ–Ž | 669/1563 [07:25<08:44, 1.70it/s] 43%|β–ˆβ–ˆβ–ˆβ–ˆβ–Ž | 670/1563 [07:25<08:02, 1.85it/s] 43%|β–ˆβ–ˆβ–ˆβ–ˆβ–Ž | 671/1563 [07:26<07:39, 1.94it/s] 43%|β–ˆβ–ˆβ–ˆβ–ˆβ–Ž | 672/1563 [07:26<07:15, 2.04it/s] 43%|β–ˆβ–ˆβ–ˆβ–ˆβ–Ž | 673/1563 [07:27<07:24, 2.00it/s] 43%|β–ˆβ–ˆβ–ˆβ–ˆβ–Ž | 674/1563 [07:28<08:56, 1.66it/s] 43%|β–ˆβ–ˆβ–ˆβ–ˆβ–Ž | 675/1563 [07:28<09:45, 1.52it/s] 43%|β–ˆβ–ˆβ–ˆβ–ˆβ–Ž | 676/1563 [07:29<09:01, 1.64it/s] 43%|β–ˆβ–ˆβ–ˆβ–ˆβ–Ž | 677/1563 [07:30<09:13, 1.60it/s] 43%|β–ˆβ–ˆβ–ˆβ–ˆβ–Ž | 678/1563 [07:30<08:19, 1.77it/s] 43%|β–ˆβ–ˆβ–ˆβ–ˆβ–Ž | 679/1563 [07:31<08:45, 1.68it/s] 44%|β–ˆβ–ˆβ–ˆβ–ˆβ–Ž | 680/1563 [07:31<09:09, 1.61it/s] 44%|β–ˆβ–ˆβ–ˆβ–ˆβ–Ž | 681/1563 [07:32<08:09, 1.80it/s] 44%|β–ˆβ–ˆβ–ˆβ–ˆβ–Ž | 682/1563 [07:32<07:43, 1.90it/s] 44%|β–ˆβ–ˆβ–ˆβ–ˆβ–Ž | 683/1563 [07:33<08:21, 1.76it/s] 44%|β–ˆβ–ˆβ–ˆβ–ˆβ– | 684/1563 [07:34<08:38, 1.69it/s] 44%|β–ˆβ–ˆβ–ˆβ–ˆβ– | 685/1563 [07:34<09:01, 1.62it/s] 44%|β–ˆβ–ˆβ–ˆβ–ˆβ– | 686/1563 [07:35<08:16, 1.77it/s] 44%|β–ˆβ–ˆβ–ˆβ–ˆβ– | 687/1563 [07:35<08:04, 1.81it/s] 44%|β–ˆβ–ˆβ–ˆβ–ˆβ– | 688/1563 [07:36<08:43, 1.67it/s] 44%|β–ˆβ–ˆβ–ˆβ–ˆβ– | 689/1563 [07:37<09:25, 1.55it/s] 44%|β–ˆβ–ˆβ–ˆβ–ˆβ– | 690/1563 [07:38<10:21, 1.41it/s] 44%|β–ˆβ–ˆβ–ˆβ–ˆβ– | 691/1563 [07:38<10:58, 1.32it/s] 44%|β–ˆβ–ˆβ–ˆβ–ˆβ– | 692/1563 [07:39<09:57, 1.46it/s] 44%|β–ˆβ–ˆβ–ˆβ–ˆβ– | 693/1563 [07:39<09:03, 1.60it/s] 44%|β–ˆβ–ˆβ–ˆβ–ˆβ– | 694/1563 [07:40<08:21, 1.73it/s] 44%|β–ˆβ–ˆβ–ˆβ–ˆβ– | 695/1563 [07:40<08:14, 1.76it/s] 45%|β–ˆβ–ˆβ–ˆβ–ˆβ– | 696/1563 [07:41<07:46, 1.86it/s] 45%|β–ˆβ–ˆβ–ˆβ–ˆβ– | 697/1563 [07:41<07:29, 1.93it/s] 45%|β–ˆβ–ˆβ–ˆβ–ˆβ– | 698/1563 [07:42<07:40, 1.88it/s] 45%|β–ˆβ–ˆβ–ˆβ–ˆβ– | 699/1563 [07:42<07:29, 1.92it/s] 45%|β–ˆβ–ˆβ–ˆβ–ˆβ– | 700/1563 [07:43<07:07, 2.02it/s] {'loss': 0.1618, 'grad_norm': 16.625, 'learning_rate': 1.105566218809981e-05, 'epoch': 0.45}
45%|β–ˆβ–ˆβ–ˆβ–ˆβ– | 700/1563 [07:43<07:07, 2.02it/s] 45%|β–ˆβ–ˆβ–ˆβ–ˆβ– | 701/1563 [07:44<08:45, 1.64it/s] 45%|β–ˆβ–ˆβ–ˆβ–ˆβ– | 702/1563 [07:44<08:32, 1.68it/s] 45%|β–ˆβ–ˆβ–ˆβ–ˆβ– | 703/1563 [07:45<07:54, 1.81it/s] 45%|β–ˆβ–ˆβ–ˆβ–ˆβ–Œ | 704/1563 [07:45<08:53, 1.61it/s] 45%|β–ˆβ–ˆβ–ˆβ–ˆβ–Œ | 705/1563 [07:46<08:14, 1.73it/s] 45%|β–ˆβ–ˆβ–ˆβ–ˆβ–Œ | 706/1563 [07:46<07:43, 1.85it/s] 45%|β–ˆβ–ˆβ–ˆβ–ˆβ–Œ | 707/1563 [07:47<08:44, 1.63it/s] 45%|β–ˆβ–ˆβ–ˆβ–ˆβ–Œ | 708/1563 [07:48<08:04, 1.76it/s] 45%|β–ˆβ–ˆβ–ˆβ–ˆβ–Œ | 709/1563 [07:48<07:37, 1.87it/s] 45%|β–ˆβ–ˆβ–ˆβ–ˆβ–Œ | 710/1563 [07:49<08:14, 1.72it/s] 45%|β–ˆβ–ˆβ–ˆβ–ˆβ–Œ | 711/1563 [07:50<08:49, 1.61it/s] 46%|β–ˆβ–ˆβ–ˆβ–ˆβ–Œ | 712/1563 [07:50<08:43, 1.63it/s] 46%|β–ˆβ–ˆβ–ˆβ–ˆβ–Œ | 713/1563 [07:51<09:29, 1.49it/s] 46%|β–ˆβ–ˆβ–ˆβ–ˆβ–Œ | 714/1563 [07:51<08:29, 1.67it/s] 46%|β–ˆβ–ˆβ–ˆβ–ˆβ–Œ | 715/1563 [07:52<08:45, 1.61it/s] 46%|β–ˆβ–ˆβ–ˆβ–ˆβ–Œ | 716/1563 [07:52<08:01, 1.76it/s] 46%|β–ˆβ–ˆβ–ˆβ–ˆβ–Œ | 717/1563 [07:53<09:02, 1.56it/s] 46%|β–ˆβ–ˆβ–ˆβ–ˆβ–Œ | 718/1563 [07:54<08:42, 1.62it/s] 46%|β–ˆβ–ˆβ–ˆβ–ˆβ–Œ | 719/1563 [07:54<08:00, 1.76it/s] 46%|β–ˆβ–ˆβ–ˆβ–ˆβ–Œ | 720/1563 [07:55<07:51, 1.79it/s] 46%|β–ˆβ–ˆβ–ˆβ–ˆβ–Œ | 721/1563 [07:56<08:49, 1.59it/s] 46%|β–ˆβ–ˆβ–ˆβ–ˆβ–Œ | 722/1563 [07:56<09:32, 1.47it/s] 46%|β–ˆβ–ˆβ–ˆβ–ˆβ–‹ | 723/1563 [07:57<10:14, 1.37it/s] 46%|β–ˆβ–ˆβ–ˆβ–ˆβ–‹ | 724/1563 [07:58<09:01, 1.55it/s] 46%|β–ˆβ–ˆβ–ˆβ–ˆβ–‹ | 725/1563 [07:59<09:51, 1.42it/s] 46%|β–ˆβ–ˆβ–ˆβ–ˆβ–‹ | 726/1563 [07:59<09:37, 1.45it/s] 47%|β–ˆβ–ˆβ–ˆβ–ˆβ–‹ | 727/1563 [08:00<09:37, 1.45it/s] 47%|β–ˆβ–ˆβ–ˆβ–ˆβ–‹ | 728/1563 [08:00<08:38, 1.61it/s] 47%|β–ˆβ–ˆβ–ˆβ–ˆβ–‹ | 729/1563 [08:01<08:54, 1.56it/s] 47%|β–ˆβ–ˆβ–ˆβ–ˆβ–‹ | 730/1563 [08:02<09:17, 1.49it/s] 47%|β–ˆβ–ˆβ–ˆβ–ˆβ–‹ | 731/1563 [08:02<08:39, 1.60it/s] 47%|β–ˆβ–ˆβ–ˆβ–ˆβ–‹ | 732/1563 [08:03<09:29, 1.46it/s] 47%|β–ˆβ–ˆβ–ˆβ–ˆβ–‹ | 733/1563 [08:04<08:56, 1.55it/s] 47%|β–ˆβ–ˆβ–ˆβ–ˆβ–‹ | 734/1563 [08:04<08:14, 1.68it/s] 47%|β–ˆβ–ˆβ–ˆβ–ˆβ–‹ | 735/1563 [08:05<09:16, 1.49it/s] 47%|β–ˆβ–ˆβ–ˆβ–ˆβ–‹ | 736/1563 [08:06<08:31, 1.62it/s] 47%|β–ˆβ–ˆβ–ˆβ–ˆβ–‹ | 737/1563 [08:06<09:10, 1.50it/s] 47%|β–ˆβ–ˆβ–ˆβ–ˆβ–‹ | 738/1563 [08:07<08:32, 1.61it/s] 47%|β–ˆβ–ˆβ–ˆβ–ˆβ–‹ | 739/1563 [08:07<08:03, 1.70it/s] 47%|β–ˆβ–ˆβ–ˆβ–ˆβ–‹ | 740/1563 [08:08<08:58, 1.53it/s] 47%|β–ˆβ–ˆβ–ˆβ–ˆβ–‹ | 741/1563 [08:09<09:29, 1.44it/s] 47%|β–ˆβ–ˆβ–ˆβ–ˆβ–‹ | 742/1563 [08:10<09:52, 1.38it/s] 48%|β–ˆβ–ˆβ–ˆβ–ˆβ–Š | 743/1563 [08:11<10:29, 1.30it/s] 48%|β–ˆβ–ˆβ–ˆβ–ˆβ–Š | 744/1563 [08:11<09:26, 1.45it/s] 48%|β–ˆβ–ˆβ–ˆβ–ˆβ–Š | 745/1563 [08:12<10:10, 1.34it/s] 48%|β–ˆβ–ˆβ–ˆβ–ˆβ–Š | 746/1563 [08:12<09:01, 1.51it/s] 48%|β–ˆβ–ˆβ–ˆβ–ˆβ–Š | 747/1563 [08:13<09:05, 1.50it/s] 48%|β–ˆβ–ˆβ–ˆβ–ˆβ–Š | 748/1563 [08:14<09:55, 1.37it/s] 48%|β–ˆβ–ˆβ–ˆβ–ˆβ–Š | 749/1563 [08:14<08:51, 1.53it/s] 48%|β–ˆβ–ˆβ–ˆβ–ˆβ–Š | 750/1563 [08:15<08:16, 1.64it/s] {'loss': 0.1435, 'grad_norm': 16.0, 'learning_rate': 1.0415866922584774e-05, 'epoch': 0.48}
48%|β–ˆβ–ˆβ–ˆβ–ˆβ–Š | 750/1563 [08:15<08:16, 1.64it/s] 48%|β–ˆβ–ˆβ–ˆβ–ˆβ–Š | 751/1563 [08:16<08:51, 1.53it/s] 48%|β–ˆβ–ˆβ–ˆβ–ˆβ–Š | 752/1563 [08:16<07:55, 1.70it/s] 48%|β–ˆβ–ˆβ–ˆβ–ˆβ–Š | 753/1563 [08:17<08:10, 1.65it/s] 48%|β–ˆβ–ˆβ–ˆβ–ˆβ–Š | 754/1563 [08:18<09:03, 1.49it/s] 48%|β–ˆβ–ˆβ–ˆβ–ˆβ–Š | 755/1563 [08:18<09:04, 1.48it/s] 48%|β–ˆβ–ˆβ–ˆβ–ˆβ–Š | 756/1563 [08:19<09:52, 1.36it/s] 48%|β–ˆβ–ˆβ–ˆβ–ˆβ–Š | 757/1563 [08:20<08:52, 1.51it/s] 48%|β–ˆβ–ˆβ–ˆβ–ˆβ–Š | 758/1563 [08:20<09:10, 1.46it/s] 49%|β–ˆβ–ˆβ–ˆβ–ˆβ–Š | 759/1563 [08:21<09:37, 1.39it/s] 49%|β–ˆβ–ˆβ–ˆβ–ˆβ–Š | 760/1563 [08:22<08:37, 1.55it/s] 49%|β–ˆβ–ˆβ–ˆβ–ˆβ–Š | 761/1563 [08:23<09:24, 1.42it/s] 49%|β–ˆβ–ˆβ–ˆβ–ˆβ–‰ | 762/1563 [08:23<08:16, 1.61it/s] 49%|β–ˆβ–ˆβ–ˆβ–ˆβ–‰ | 763/1563 [08:23<07:41, 1.73it/s] 49%|β–ˆβ–ˆβ–ˆβ–ˆβ–‰ | 764/1563 [08:24<08:29, 1.57it/s] 49%|β–ˆβ–ˆβ–ˆβ–ˆβ–‰ | 765/1563 [08:25<09:21, 1.42it/s] 49%|β–ˆβ–ˆβ–ˆβ–ˆβ–‰ | 766/1563 [08:26<09:58, 1.33it/s] 49%|β–ˆβ–ˆβ–ˆβ–ˆβ–‰ | 767/1563 [08:27<10:19, 1.28it/s] 49%|β–ˆβ–ˆβ–ˆβ–ˆβ–‰ | 768/1563 [08:27<09:46, 1.36it/s] 49%|β–ˆβ–ˆβ–ˆβ–ˆβ–‰ | 769/1563 [08:28<08:46, 1.51it/s] 49%|β–ˆβ–ˆβ–ˆβ–ˆβ–‰ | 770/1563 [08:28<07:57, 1.66it/s] 49%|β–ˆβ–ˆβ–ˆβ–ˆβ–‰ | 771/1563 [08:29<08:52, 1.49it/s] 49%|β–ˆβ–ˆβ–ˆβ–ˆβ–‰ | 772/1563 [08:30<09:05, 1.45it/s] 49%|β–ˆβ–ˆβ–ˆβ–ˆβ–‰ | 773/1563 [08:31<09:27, 1.39it/s] 50%|β–ˆβ–ˆβ–ˆβ–ˆβ–‰ | 774/1563 [08:31<09:18, 1.41it/s] 50%|β–ˆβ–ˆβ–ˆβ–ˆβ–‰ | 775/1563 [08:32<08:19, 1.58it/s] 50%|β–ˆβ–ˆβ–ˆβ–ˆβ–‰ | 776/1563 [08:33<08:47, 1.49it/s] 50%|β–ˆβ–ˆβ–ˆβ–ˆβ–‰ | 777/1563 [08:33<08:36, 1.52it/s] 50%|β–ˆβ–ˆβ–ˆβ–ˆβ–‰ | 778/1563 [08:34<08:55, 1.47it/s] 50%|β–ˆβ–ˆβ–ˆβ–ˆβ–‰ | 779/1563 [08:35<09:34, 1.36it/s] 50%|β–ˆβ–ˆβ–ˆβ–ˆβ–‰ | 780/1563 [08:35<08:46, 1.49it/s] 50%|β–ˆβ–ˆβ–ˆβ–ˆβ–‰ | 781/1563 [08:36<08:51, 1.47it/s] 50%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆ | 782/1563 [08:37<07:59, 1.63it/s] 50%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆ | 783/1563 [08:37<07:39, 1.70it/s] 50%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆ | 784/1563 [08:38<07:41, 1.69it/s] 50%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆ | 785/1563 [08:38<08:01, 1.62it/s] 50%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆ | 786/1563 [08:39<07:24, 1.75it/s] 50%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆ | 787/1563 [08:40<08:15, 1.57it/s] 50%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆ | 788/1563 [08:40<07:32, 1.71it/s] 50%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆ | 789/1563 [08:41<07:52, 1.64it/s] 51%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆ | 790/1563 [08:41<07:15, 1.77it/s] 51%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆ | 791/1563 [08:42<07:17, 1.77it/s] 51%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆ | 792/1563 [08:42<07:30, 1.71it/s] 51%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆ | 793/1563 [08:43<06:54, 1.86it/s] 51%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆ | 794/1563 [08:44<07:46, 1.65it/s] 51%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆ | 795/1563 [08:44<07:07, 1.80it/s] 51%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆ | 796/1563 [08:45<08:13, 1.55it/s] 51%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆ | 797/1563 [08:46<09:02, 1.41it/s] 51%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆ | 798/1563 [08:46<08:12, 1.55it/s] 51%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆ | 799/1563 [08:47<09:00, 1.41it/s] 51%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆ | 800/1563 [08:48<09:27, 1.34it/s] {'loss': 0.1442, 'grad_norm': 1.359375, 'learning_rate': 9.776071657069739e-06, 'epoch': 0.51}
51%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆ | 800/1563 [08:48<09:27, 1.34it/s] 51%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆ | 801/1563 [08:49<09:30, 1.34it/s] 51%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ– | 802/1563 [08:49<09:12, 1.38it/s] 51%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ– | 803/1563 [08:50<08:01, 1.58it/s] 51%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ– | 804/1563 [08:51<08:58, 1.41it/s] 52%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ– | 805/1563 [08:51<09:24, 1.34it/s] 52%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ– | 806/1563 [08:52<09:22, 1.35it/s] 52%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ– | 807/1563 [08:53<08:33, 1.47it/s] 52%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ– | 808/1563 [08:53<07:47, 1.61it/s] 52%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ– | 809/1563 [08:54<08:37, 1.46it/s] 52%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ– | 810/1563 [08:55<09:07, 1.38it/s] 52%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ– | 811/1563 [08:55<07:59, 1.57it/s] 52%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ– | 812/1563 [08:56<08:13, 1.52it/s] 52%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ– | 813/1563 [08:57<08:36, 1.45it/s] 52%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ– | 814/1563 [08:57<08:31, 1.46it/s] 52%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ– | 815/1563 [08:58<09:05, 1.37it/s] 52%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ– | 816/1563 [08:59<09:21, 1.33it/s] 52%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ– | 817/1563 [09:00<09:04, 1.37it/s] 52%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ– | 818/1563 [09:00<08:58, 1.38it/s] 52%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ– | 819/1563 [09:01<07:51, 1.58it/s] 52%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ– | 820/1563 [09:01<07:42, 1.61it/s] 53%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Ž | 821/1563 [09:02<07:15, 1.70it/s] 53%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Ž | 822/1563 [09:02<06:44, 1.83it/s] 53%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Ž | 823/1563 [09:03<07:52, 1.57it/s] 53%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Ž | 824/1563 [09:04<08:42, 1.41it/s] 53%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Ž | 825/1563 [09:05<08:46, 1.40it/s] 53%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Ž | 826/1563 [09:05<08:17, 1.48it/s] 53%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Ž | 827/1563 [09:06<07:30, 1.64it/s] 53%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Ž | 828/1563 [09:07<08:13, 1.49it/s] 53%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Ž | 829/1563 [09:08<08:40, 1.41it/s] 53%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Ž | 830/1563 [09:08<08:19, 1.47it/s] 53%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Ž | 831/1563 [09:09<08:19, 1.46it/s] 53%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Ž | 832/1563 [09:09<07:37, 1.60it/s] 53%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Ž | 833/1563 [09:10<07:50, 1.55it/s] 53%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Ž | 834/1563 [09:10<07:03, 1.72it/s] 53%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Ž | 835/1563 [09:11<07:14, 1.68it/s] 53%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Ž | 836/1563 [09:12<08:08, 1.49it/s] 54%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Ž | 837/1563 [09:13<08:08, 1.49it/s] 54%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Ž | 838/1563 [09:13<08:21, 1.44it/s] 54%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Ž | 839/1563 [09:14<08:51, 1.36it/s] 54%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Ž | 840/1563 [09:15<08:45, 1.37it/s] 54%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ– | 841/1563 [09:16<09:04, 1.33it/s] 54%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ– | 842/1563 [09:17<09:27, 1.27it/s] 54%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ– | 843/1563 [09:17<09:41, 1.24it/s] 54%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ– | 844/1563 [09:18<09:18, 1.29it/s] 54%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ– | 845/1563 [09:19<08:53, 1.35it/s] 54%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ– | 846/1563 [09:19<08:22, 1.43it/s] 54%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ– | 847/1563 [09:20<07:29, 1.59it/s] 54%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ– | 848/1563 [09:21<08:12, 1.45it/s] 54%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ– | 849/1563 [09:22<08:42, 1.37it/s] 54%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ– | 850/1563 [09:22<07:55, 1.50it/s] {'loss': 0.1416, 'grad_norm': 1.0859375, 'learning_rate': 9.136276391554704e-06, 'epoch': 0.54}
54%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ– | 850/1563 [09:22<07:55, 1.50it/s] 54%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ– | 851/1563 [09:23<08:05, 1.47it/s] 55%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ– | 852/1563 [09:23<07:20, 1.62it/s] 55%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ– | 853/1563 [09:24<07:00, 1.69it/s] 55%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ– | 854/1563 [09:24<06:31, 1.81it/s] 55%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ– | 855/1563 [09:25<07:20, 1.61it/s] 55%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ– | 856/1563 [09:26<07:03, 1.67it/s] 55%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ– | 857/1563 [09:26<06:33, 1.79it/s] 55%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ– | 858/1563 [09:27<07:33, 1.55it/s] 55%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ– | 859/1563 [09:28<07:41, 1.53it/s] 55%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Œ | 860/1563 [09:28<07:04, 1.66it/s] 55%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Œ | 861/1563 [09:29<07:14, 1.62it/s] 55%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Œ | 862/1563 [09:29<06:28, 1.81it/s] 55%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Œ | 863/1563 [09:30<06:04, 1.92it/s] 55%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Œ | 864/1563 [09:30<06:33, 1.78it/s] 55%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Œ | 865/1563 [09:31<07:24, 1.57it/s] 55%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Œ | 866/1563 [09:31<06:43, 1.73it/s] 55%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Œ | 867/1563 [09:32<07:26, 1.56it/s] 56%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Œ | 868/1563 [09:33<07:24, 1.56it/s] 56%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Œ | 869/1563 [09:33<06:48, 1.70it/s] 56%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Œ | 870/1563 [09:34<06:22, 1.81it/s] 56%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Œ | 871/1563 [09:34<06:05, 1.89it/s] 56%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Œ | 872/1563 [09:35<07:09, 1.61it/s] 56%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Œ | 873/1563 [09:36<06:40, 1.72it/s] 56%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Œ | 874/1563 [09:36<06:54, 1.66it/s] 56%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Œ | 875/1563 [09:37<07:20, 1.56it/s] 56%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Œ | 876/1563 [09:38<07:35, 1.51it/s] 56%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Œ | 877/1563 [09:38<07:23, 1.55it/s] 56%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Œ | 878/1563 [09:39<06:46, 1.68it/s] 56%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Œ | 879/1563 [09:39<07:08, 1.60it/s] 56%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‹ | 880/1563 [09:40<07:45, 1.47it/s] 56%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‹ | 881/1563 [09:41<06:54, 1.65it/s] 56%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‹ | 882/1563 [09:42<07:47, 1.46it/s] 56%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‹ | 883/1563 [09:42<07:02, 1.61it/s] 57%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‹ | 884/1563 [09:43<07:49, 1.45it/s] 57%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‹ | 885/1563 [09:44<07:44, 1.46it/s] 57%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‹ | 886/1563 [09:44<07:30, 1.50it/s] 57%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‹ | 887/1563 [09:45<06:42, 1.68it/s] 57%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‹ | 888/1563 [09:45<07:03, 1.59it/s] 57%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‹ | 889/1563 [09:46<07:25, 1.51it/s] 57%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‹ | 890/1563 [09:47<07:57, 1.41it/s] 57%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‹ | 891/1563 [09:47<07:22, 1.52it/s] 57%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‹ | 892/1563 [09:48<06:45, 1.66it/s] 57%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‹ | 893/1563 [09:48<06:24, 1.74it/s] 57%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‹ | 894/1563 [09:49<06:08, 1.82it/s] 57%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‹ | 895/1563 [09:50<07:10, 1.55it/s] 57%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‹ | 896/1563 [09:50<06:50, 1.63it/s] 57%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‹ | 897/1563 [09:51<06:38, 1.67it/s] 57%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‹ | 898/1563 [09:51<06:20, 1.75it/s] 58%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Š | 899/1563 [09:52<06:25, 1.72it/s] 58%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Š | 900/1563 [09:52<06:03, 1.83it/s] {'loss': 0.1478, 'grad_norm': 5.09375, 'learning_rate': 8.496481126039668e-06, 'epoch': 0.58}
58%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Š | 900/1563 [09:53<06:03, 1.83it/s] 58%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Š | 901/1563 [09:53<05:52, 1.88it/s] 58%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Š | 902/1563 [09:54<06:17, 1.75it/s] 58%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Š | 903/1563 [09:54<06:01, 1.83it/s] 58%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Š | 904/1563 [09:55<07:03, 1.56it/s] 58%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Š | 905/1563 [09:55<06:29, 1.69it/s] 58%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Š | 906/1563 [09:56<06:09, 1.78it/s] 58%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Š | 907/1563 [09:57<06:46, 1.61it/s] 58%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Š | 908/1563 [09:57<06:19, 1.73it/s] 58%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Š | 909/1563 [09:58<06:40, 1.63it/s] 58%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Š | 910/1563 [09:58<06:14, 1.75it/s] 58%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Š | 911/1563 [09:59<06:05, 1.78it/s] 58%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Š | 912/1563 [10:00<06:28, 1.68it/s] 58%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Š | 913/1563 [10:00<06:12, 1.74it/s] 58%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Š | 914/1563 [10:01<06:31, 1.66it/s] 59%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Š | 915/1563 [10:01<06:19, 1.71it/s] 59%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Š | 916/1563 [10:02<06:11, 1.74it/s] 59%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Š | 917/1563 [10:02<05:45, 1.87it/s] 59%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Š | 918/1563 [10:03<05:36, 1.92it/s] 59%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‰ | 919/1563 [10:04<06:17, 1.70it/s] 59%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‰ | 920/1563 [10:04<06:56, 1.55it/s] 59%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‰ | 921/1563 [10:05<07:02, 1.52it/s] 59%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‰ | 922/1563 [10:06<07:04, 1.51it/s] 59%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‰ | 923/1563 [10:06<07:23, 1.44it/s] 59%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‰ | 924/1563 [10:07<07:12, 1.48it/s] 59%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‰ | 925/1563 [10:08<06:29, 1.64it/s] 59%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‰ | 926/1563 [10:08<07:05, 1.50it/s] 59%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‰ | 927/1563 [10:09<07:20, 1.44it/s] 59%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‰ | 928/1563 [10:10<06:42, 1.58it/s] 59%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‰ | 929/1563 [10:10<06:10, 1.71it/s] 60%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‰ | 930/1563 [10:10<05:44, 1.84it/s] 60%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‰ | 931/1563 [10:11<05:30, 1.91it/s] 60%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‰ | 932/1563 [10:11<05:13, 2.01it/s] 60%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‰ | 933/1563 [10:12<05:47, 1.81it/s] 60%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‰ | 934/1563 [10:13<06:41, 1.56it/s] 60%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‰ | 935/1563 [10:13<06:17, 1.66it/s] 60%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‰ | 936/1563 [10:14<07:04, 1.48it/s] 60%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‰ | 937/1563 [10:15<06:19, 1.65it/s] 60%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆ | 938/1563 [10:16<06:57, 1.50it/s] 60%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆ | 939/1563 [10:16<07:07, 1.46it/s] 60%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆ | 940/1563 [10:17<06:38, 1.56it/s] 60%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆ | 941/1563 [10:17<05:58, 1.73it/s] 60%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆ | 942/1563 [10:18<06:34, 1.57it/s] 60%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆ | 943/1563 [10:19<06:54, 1.50it/s] 60%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆ | 944/1563 [10:19<06:49, 1.51it/s] 60%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆ | 945/1563 [10:20<06:18, 1.63it/s] 61%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆ | 946/1563 [10:20<05:44, 1.79it/s] 61%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆ | 947/1563 [10:21<06:04, 1.69it/s] 61%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆ | 948/1563 [10:21<05:31, 1.86it/s] 61%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆ | 949/1563 [10:22<05:19, 1.92it/s] 61%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆ | 950/1563 [10:22<05:15, 1.94it/s] {'loss': 0.1462, 'grad_norm': 27.75, 'learning_rate': 7.856685860524633e-06, 'epoch': 0.61}
61%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆ | 950/1563 [10:22<05:15, 1.94it/s] 61%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆ | 951/1563 [10:23<06:16, 1.63it/s] 61%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆ | 952/1563 [10:24<05:56, 1.71it/s] 61%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆ | 953/1563 [10:24<05:36, 1.81it/s] 61%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆ | 954/1563 [10:25<05:29, 1.85it/s] 61%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆ | 955/1563 [10:26<06:08, 1.65it/s] 61%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆ | 956/1563 [10:26<06:20, 1.59it/s] 61%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆ | 957/1563 [10:27<06:33, 1.54it/s] 61%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ– | 958/1563 [10:27<06:14, 1.62it/s] 61%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ– | 959/1563 [10:28<06:08, 1.64it/s] 61%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ– | 960/1563 [10:29<06:36, 1.52it/s] 61%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ– | 961/1563 [10:29<06:41, 1.50it/s] 62%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ– | 962/1563 [10:30<06:36, 1.52it/s] 62%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ– | 963/1563 [10:31<06:15, 1.60it/s] 62%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ– | 964/1563 [10:31<06:36, 1.51it/s] 62%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ– | 965/1563 [10:32<06:22, 1.56it/s] 62%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ– | 966/1563 [10:33<06:15, 1.59it/s] 62%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ– | 967/1563 [10:33<06:32, 1.52it/s] 62%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ– | 968/1563 [10:34<05:59, 1.65it/s] 62%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ– | 969/1563 [10:35<06:41, 1.48it/s] 62%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ– | 970/1563 [10:35<06:04, 1.63it/s] 62%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ– | 971/1563 [10:36<06:34, 1.50it/s] 62%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ– | 972/1563 [10:36<06:05, 1.62it/s] 62%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ– | 973/1563 [10:37<06:50, 1.44it/s] 62%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ– | 974/1563 [10:38<06:47, 1.45it/s] 62%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ– | 975/1563 [10:39<07:15, 1.35it/s] 62%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ– | 976/1563 [10:40<07:29, 1.30it/s] 63%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Ž | 977/1563 [10:40<07:16, 1.34it/s] 63%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Ž | 978/1563 [10:41<07:03, 1.38it/s] 63%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Ž | 979/1563 [10:42<07:03, 1.38it/s] 63%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Ž | 980/1563 [10:43<07:27, 1.30it/s] 63%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Ž | 981/1563 [10:43<07:39, 1.27it/s] 63%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Ž | 982/1563 [10:44<07:41, 1.26it/s] 63%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Ž | 983/1563 [10:45<07:51, 1.23it/s] 63%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Ž | 984/1563 [10:46<07:00, 1.38it/s] 63%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Ž | 985/1563 [10:46<06:14, 1.54it/s] 63%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Ž | 986/1563 [10:47<06:52, 1.40it/s] 63%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Ž | 987/1563 [10:48<06:53, 1.39it/s] 63%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Ž | 988/1563 [10:48<06:53, 1.39it/s] 63%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Ž | 989/1563 [10:49<06:43, 1.42it/s] 63%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Ž | 990/1563 [10:50<06:22, 1.50it/s] 63%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Ž | 991/1563 [10:50<06:46, 1.41it/s] 63%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Ž | 992/1563 [10:51<06:14, 1.53it/s] 64%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Ž | 993/1563 [10:52<06:04, 1.56it/s] 64%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Ž | 994/1563 [10:52<06:06, 1.55it/s] 64%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Ž | 995/1563 [10:53<05:51, 1.62it/s] 64%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Ž | 996/1563 [10:53<05:33, 1.70it/s] 64%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ– | 997/1563 [10:54<05:15, 1.80it/s] 64%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ– | 998/1563 [10:55<06:05, 1.54it/s] 64%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ– | 999/1563 [10:55<05:53, 1.60it/s] 64%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ– | 1000/1563 [10:56<06:25, 1.46it/s] {'loss': 0.1499, 'grad_norm': 0.921875, 'learning_rate': 7.216890595009598e-06, 'epoch': 0.64}
64%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ– | 1000/1563 [10:56<06:25, 1.46it/s] 64%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ– | 1001/1563 [10:57<06:18, 1.48it/s] 64%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ– | 1002/1563 [10:57<06:20, 1.47it/s] 64%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ– | 1003/1563 [10:58<06:31, 1.43it/s] 64%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ– | 1004/1563 [10:59<06:33, 1.42it/s] 64%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ– | 1005/1563 [10:59<06:01, 1.54it/s] 64%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ– | 1006/1563 [11:00<06:42, 1.39it/s] 64%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ– | 1007/1563 [11:01<07:06, 1.31it/s] 64%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ– | 1008/1563 [11:02<07:16, 1.27it/s] 65%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ– | 1009/1563 [11:03<06:51, 1.34it/s] 65%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ– | 1010/1563 [11:04<07:09, 1.29it/s] 65%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ– | 1011/1563 [11:04<07:22, 1.25it/s] 65%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ– | 1012/1563 [11:05<06:26, 1.43it/s] 65%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ– | 1013/1563 [11:06<06:29, 1.41it/s] 65%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ– | 1014/1563 [11:06<05:42, 1.60it/s] 65%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ– | 1015/1563 [11:06<05:21, 1.70it/s] 65%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Œ | 1016/1563 [11:07<05:07, 1.78it/s] 65%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Œ | 1017/1563 [11:07<04:50, 1.88it/s] 65%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Œ | 1018/1563 [11:08<05:20, 1.70it/s] 65%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Œ | 1019/1563 [11:09<05:31, 1.64it/s] 65%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Œ | 1020/1563 [11:09<05:06, 1.77it/s] 65%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Œ | 1021/1563 [11:10<05:30, 1.64it/s] 65%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Œ | 1022/1563 [11:11<05:14, 1.72it/s] 65%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Œ | 1023/1563 [11:11<04:50, 1.86it/s] 66%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Œ | 1024/1563 [11:11<04:45, 1.89it/s] 66%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Œ | 1025/1563 [11:12<05:09, 1.74it/s] 66%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Œ | 1026/1563 [11:13<04:52, 1.83it/s] 66%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Œ | 1027/1563 [11:13<05:37, 1.59it/s] 66%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Œ | 1028/1563 [11:14<06:13, 1.43it/s] 66%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Œ | 1029/1563 [11:15<06:31, 1.36it/s] 66%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Œ | 1030/1563 [11:16<06:24, 1.38it/s] 66%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Œ | 1031/1563 [11:17<06:19, 1.40it/s] 66%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Œ | 1032/1563 [11:17<05:35, 1.58it/s] 66%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Œ | 1033/1563 [11:18<05:39, 1.56it/s] 66%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Œ | 1034/1563 [11:18<06:15, 1.41it/s] 66%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Œ | 1035/1563 [11:19<06:14, 1.41it/s] 66%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‹ | 1036/1563 [11:20<05:33, 1.58it/s] 66%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‹ | 1037/1563 [11:20<05:44, 1.53it/s] 66%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‹ | 1038/1563 [11:21<06:09, 1.42it/s] 66%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‹ | 1039/1563 [11:22<05:44, 1.52it/s] 67%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‹ | 1040/1563 [11:23<06:15, 1.39it/s] 67%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‹ | 1041/1563 [11:23<05:40, 1.53it/s] 67%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‹ | 1042/1563 [11:24<06:04, 1.43it/s] 67%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‹ | 1043/1563 [11:24<05:33, 1.56it/s] 67%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‹ | 1044/1563 [11:25<06:07, 1.41it/s] 67%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‹ | 1045/1563 [11:26<06:12, 1.39it/s] 67%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‹ | 1046/1563 [11:26<05:27, 1.58it/s] 67%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‹ | 1047/1563 [11:27<05:57, 1.44it/s] 67%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‹ | 1048/1563 [11:28<05:17, 1.62it/s] 67%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‹ | 1049/1563 [11:28<04:51, 1.77it/s] 67%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‹ | 1050/1563 [11:29<04:28, 1.91it/s] {'loss': 0.1438, 'grad_norm': 13.25, 'learning_rate': 6.577095329494563e-06, 'epoch': 0.67}
67%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‹ | 1050/1563 [11:29<04:28, 1.91it/s] 67%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‹ | 1051/1563 [11:29<04:18, 1.98it/s] 67%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‹ | 1052/1563 [11:30<04:14, 2.01it/s] 67%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‹ | 1053/1563 [11:30<05:19, 1.60it/s] 67%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‹ | 1054/1563 [11:31<05:51, 1.45it/s] 67%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‹ | 1055/1563 [11:32<06:14, 1.36it/s] 68%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Š | 1056/1563 [11:33<06:18, 1.34it/s] 68%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Š | 1057/1563 [11:34<06:13, 1.35it/s] 68%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Š | 1058/1563 [11:34<06:28, 1.30it/s] 68%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Š | 1059/1563 [11:35<06:41, 1.26it/s] 68%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Š | 1060/1563 [11:36<06:28, 1.29it/s] 68%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Š | 1061/1563 [11:37<06:31, 1.28it/s] 68%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Š | 1062/1563 [11:38<06:34, 1.27it/s] 68%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Š | 1063/1563 [11:38<06:37, 1.26it/s] 68%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Š | 1064/1563 [11:39<05:58, 1.39it/s] 68%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Š | 1065/1563 [11:40<06:18, 1.31it/s] 68%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Š | 1066/1563 [11:40<05:30, 1.50it/s] 68%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Š | 1067/1563 [11:41<05:45, 1.44it/s] 68%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Š | 1068/1563 [11:42<05:45, 1.43it/s] 68%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Š | 1069/1563 [11:43<06:04, 1.36it/s] 68%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Š | 1070/1563 [11:43<06:20, 1.29it/s] 69%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Š | 1071/1563 [11:44<05:32, 1.48it/s] 69%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Š | 1072/1563 [11:45<05:59, 1.36it/s] 69%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Š | 1073/1563 [11:46<06:18, 1.30it/s] 69%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Š | 1074/1563 [11:46<05:28, 1.49it/s] 69%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‰ | 1075/1563 [11:47<05:52, 1.38it/s] 69%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‰ | 1076/1563 [11:48<06:06, 1.33it/s] 69%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‰ | 1077/1563 [11:48<05:45, 1.41it/s] 69%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‰ | 1078/1563 [11:49<05:37, 1.44it/s] 69%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‰ | 1079/1563 [11:50<05:55, 1.36it/s] 69%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‰ | 1080/1563 [11:50<05:31, 1.46it/s] 69%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‰ | 1081/1563 [11:51<05:58, 1.34it/s] 69%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‰ | 1082/1563 [11:52<06:16, 1.28it/s] 69%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‰ | 1083/1563 [11:53<05:57, 1.34it/s] 69%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‰ | 1084/1563 [11:54<06:07, 1.30it/s] 69%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‰ | 1085/1563 [11:54<05:44, 1.39it/s] 69%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‰ | 1086/1563 [11:55<05:18, 1.50it/s] 70%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‰ | 1087/1563 [11:55<05:18, 1.50it/s] 70%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‰ | 1088/1563 [11:56<04:56, 1.60it/s] 70%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‰ | 1089/1563 [11:56<04:36, 1.72it/s] 70%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‰ | 1090/1563 [11:57<05:12, 1.51it/s] 70%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‰ | 1091/1563 [11:58<05:40, 1.39it/s] 70%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‰ | 1092/1563 [11:59<05:11, 1.51it/s] 70%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‰ | 1093/1563 [11:59<05:08, 1.52it/s] 70%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‰ | 1094/1563 [12:00<05:35, 1.40it/s] 70%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆ | 1095/1563 [12:01<04:58, 1.57it/s] 70%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆ | 1096/1563 [12:01<04:30, 1.73it/s] 70%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆ | 1097/1563 [12:02<04:39, 1.67it/s] 70%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆ | 1098/1563 [12:03<05:12, 1.49it/s] 70%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆ | 1099/1563 [12:03<05:29, 1.41it/s] 70%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆ | 1100/1563 [12:04<05:33, 1.39it/s] {'loss': 0.1419, 'grad_norm': 9.1875, 'learning_rate': 5.937300063979527e-06, 'epoch': 0.7}
70%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆ | 1100/1563 [12:04<05:33, 1.39it/s] 70%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆ | 1101/1563 [12:05<05:09, 1.49it/s] 71%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆ | 1102/1563 [12:06<05:32, 1.39it/s] 71%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆ | 1103/1563 [12:06<05:05, 1.51it/s] 71%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆ | 1104/1563 [12:07<04:39, 1.64it/s] 71%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆ | 1105/1563 [12:07<04:25, 1.72it/s] 71%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆ | 1106/1563 [12:08<05:03, 1.51it/s] 71%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆ | 1107/1563 [12:09<05:04, 1.50it/s] 71%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆ | 1108/1563 [12:09<05:17, 1.43it/s] 71%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆ | 1109/1563 [12:10<05:40, 1.33it/s] 71%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆ | 1110/1563 [12:11<04:53, 1.54it/s] 71%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆ | 1111/1563 [12:11<05:16, 1.43it/s] 71%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆ | 1112/1563 [12:12<04:45, 1.58it/s] 71%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆ | 1113/1563 [12:13<05:12, 1.44it/s] 71%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ– | 1114/1563 [12:14<05:22, 1.39it/s] 71%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ– | 1115/1563 [12:14<05:36, 1.33it/s] 71%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ– | 1116/1563 [12:15<05:01, 1.48it/s] 71%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ– | 1117/1563 [12:16<05:16, 1.41it/s] 72%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ– | 1118/1563 [12:16<04:58, 1.49it/s] 72%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ– | 1119/1563 [12:17<05:19, 1.39it/s] 72%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ– | 1120/1563 [12:18<05:27, 1.35it/s] 72%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ– | 1121/1563 [12:19<05:20, 1.38it/s] 72%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ– | 1122/1563 [12:19<04:50, 1.52it/s] 72%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ– | 1123/1563 [12:20<05:14, 1.40it/s] 72%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ– | 1124/1563 [12:21<05:25, 1.35it/s] 72%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ– | 1125/1563 [12:21<05:27, 1.34it/s] 72%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ– | 1126/1563 [12:22<04:53, 1.49it/s] 72%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ– | 1127/1563 [12:23<04:40, 1.56it/s] 72%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ– | 1128/1563 [12:23<04:53, 1.48it/s] 72%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ– | 1129/1563 [12:24<04:26, 1.63it/s] 72%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ– | 1130/1563 [12:25<04:54, 1.47it/s] 72%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ– | 1131/1563 [12:25<05:15, 1.37it/s] 72%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ– | 1132/1563 [12:26<04:35, 1.57it/s] 72%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ– | 1133/1563 [12:27<04:38, 1.55it/s] 73%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Ž | 1134/1563 [12:27<04:20, 1.65it/s] 73%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Ž | 1135/1563 [12:28<04:41, 1.52it/s] 73%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Ž | 1136/1563 [12:29<04:55, 1.45it/s] 73%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Ž | 1137/1563 [12:29<05:16, 1.35it/s] 73%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Ž | 1138/1563 [12:30<04:39, 1.52it/s] 73%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Ž | 1139/1563 [12:31<04:37, 1.53it/s] 73%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Ž | 1140/1563 [12:31<04:26, 1.59it/s] 73%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Ž | 1141/1563 [12:32<04:41, 1.50it/s] 73%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Ž | 1142/1563 [12:33<05:01, 1.40it/s] 73%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Ž | 1143/1563 [12:33<05:10, 1.35it/s] 73%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Ž | 1144/1563 [12:34<05:23, 1.30it/s] 73%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Ž | 1145/1563 [12:35<05:30, 1.26it/s] 73%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Ž | 1146/1563 [12:36<05:33, 1.25it/s] 73%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Ž | 1147/1563 [12:37<05:38, 1.23it/s] 73%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Ž | 1148/1563 [12:38<05:31, 1.25it/s] 74%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Ž | 1149/1563 [12:38<04:52, 1.41it/s] 74%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Ž | 1150/1563 [12:39<05:05, 1.35it/s] {'loss': 0.1443, 'grad_norm': 3.234375, 'learning_rate': 5.297504798464492e-06, 'epoch': 0.74}
74%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Ž | 1150/1563 [12:39<05:05, 1.35it/s] 74%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Ž | 1151/1563 [12:40<05:08, 1.34it/s] 74%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Ž | 1152/1563 [12:41<05:18, 1.29it/s] 74%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ– | 1153/1563 [12:41<05:09, 1.32it/s] 74%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ– | 1154/1563 [12:42<04:29, 1.52it/s] 74%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ– | 1155/1563 [12:42<04:07, 1.65it/s] 74%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ– | 1156/1563 [12:43<04:06, 1.65it/s] 74%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ– | 1157/1563 [12:43<03:56, 1.72it/s] 74%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ– | 1158/1563 [12:44<03:40, 1.84it/s] 74%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ– | 1159/1563 [12:44<03:53, 1.73it/s] 74%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ– | 1160/1563 [12:45<04:07, 1.63it/s] 74%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ– | 1161/1563 [12:46<04:17, 1.56it/s] 74%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ– | 1162/1563 [12:47<04:32, 1.47it/s] 74%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ– | 1163/1563 [12:47<04:15, 1.56it/s] 74%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ– | 1164/1563 [12:48<04:09, 1.60it/s] 75%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ– | 1165/1563 [12:48<03:56, 1.68it/s] 75%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ– | 1166/1563 [12:49<04:27, 1.49it/s] 75%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ– | 1167/1563 [12:50<04:48, 1.37it/s] 75%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ– | 1168/1563 [12:50<04:15, 1.54it/s] 75%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ– | 1169/1563 [12:51<04:28, 1.46it/s] 75%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ– | 1170/1563 [12:52<04:24, 1.49it/s] 75%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ– | 1171/1563 [12:53<04:37, 1.41it/s] 75%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ– | 1172/1563 [12:53<04:11, 1.55it/s] 75%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Œ | 1173/1563 [12:54<04:31, 1.44it/s] 75%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Œ | 1174/1563 [12:55<04:50, 1.34it/s] 75%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Œ | 1175/1563 [12:55<04:41, 1.38it/s] 75%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Œ | 1176/1563 [12:56<04:54, 1.31it/s] 75%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Œ | 1177/1563 [12:57<04:43, 1.36it/s] 75%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Œ | 1178/1563 [12:57<04:11, 1.53it/s] 75%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Œ | 1179/1563 [12:58<04:27, 1.44it/s] 75%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Œ | 1180/1563 [12:59<04:36, 1.38it/s] 76%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Œ | 1181/1563 [13:00<04:09, 1.53it/s] 76%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Œ | 1182/1563 [13:00<03:47, 1.67it/s] 76%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Œ | 1183/1563 [13:01<04:03, 1.56it/s] 76%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Œ | 1184/1563 [13:01<03:44, 1.69it/s] 76%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Œ | 1185/1563 [13:02<03:31, 1.79it/s] 76%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Œ | 1186/1563 [13:02<03:17, 1.91it/s] 76%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Œ | 1187/1563 [13:03<03:45, 1.67it/s] 76%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Œ | 1188/1563 [13:04<04:13, 1.48it/s] 76%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Œ | 1189/1563 [13:05<04:35, 1.36it/s] 76%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Œ | 1190/1563 [13:05<04:28, 1.39it/s] 76%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Œ | 1191/1563 [13:06<03:59, 1.56it/s] 76%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‹ | 1192/1563 [13:06<04:03, 1.53it/s] 76%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‹ | 1193/1563 [13:07<04:25, 1.39it/s] 76%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‹ | 1194/1563 [13:08<04:10, 1.47it/s] 76%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‹ | 1195/1563 [13:09<04:30, 1.36it/s] 77%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‹ | 1196/1563 [13:10<04:43, 1.29it/s] 77%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‹ | 1197/1563 [13:10<04:10, 1.46it/s] 77%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‹ | 1198/1563 [13:11<03:44, 1.63it/s] 77%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‹ | 1199/1563 [13:11<03:30, 1.73it/s] 77%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‹ | 1200/1563 [13:12<03:26, 1.75it/s] {'loss': 0.1396, 'grad_norm': 1.1015625, 'learning_rate': 4.657709532949457e-06, 'epoch': 0.77}
77%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‹ | 1200/1563 [13:12<03:26, 1.75it/s] 77%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‹ | 1201/1563 [13:12<03:54, 1.54it/s] 77%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‹ | 1202/1563 [13:13<04:04, 1.48it/s] 77%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‹ | 1203/1563 [13:14<03:56, 1.52it/s] 77%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‹ | 1204/1563 [13:14<03:30, 1.70it/s] 77%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‹ | 1205/1563 [13:15<03:59, 1.50it/s] 77%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‹ | 1206/1563 [13:16<03:32, 1.68it/s] 77%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‹ | 1207/1563 [13:16<03:41, 1.60it/s] 77%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‹ | 1208/1563 [13:17<03:47, 1.56it/s] 77%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‹ | 1209/1563 [13:18<03:51, 1.53it/s] 77%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‹ | 1210/1563 [13:18<03:36, 1.63it/s] 77%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‹ | 1211/1563 [13:19<04:00, 1.46it/s] 78%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Š | 1212/1563 [13:19<03:37, 1.61it/s] 78%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Š | 1213/1563 [13:20<04:02, 1.45it/s] 78%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Š | 1214/1563 [13:21<03:40, 1.58it/s] 78%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Š | 1215/1563 [13:21<03:50, 1.51it/s] 78%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Š | 1216/1563 [13:22<04:08, 1.40it/s] 78%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Š | 1217/1563 [13:23<03:40, 1.57it/s] 78%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Š | 1218/1563 [13:23<03:45, 1.53it/s] 78%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Š | 1219/1563 [13:24<04:03, 1.41it/s] 78%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Š | 1220/1563 [13:25<03:46, 1.51it/s] 78%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Š | 1221/1563 [13:26<03:51, 1.48it/s] 78%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Š | 1222/1563 [13:26<03:31, 1.61it/s] 78%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Š | 1223/1563 [13:27<03:51, 1.47it/s] 78%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Š | 1224/1563 [13:28<04:00, 1.41it/s] 78%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Š | 1225/1563 [13:28<03:48, 1.48it/s] 78%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Š | 1226/1563 [13:29<03:28, 1.62it/s] 79%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Š | 1227/1563 [13:29<03:38, 1.53it/s] 79%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Š | 1228/1563 [13:30<03:14, 1.73it/s] 79%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Š | 1229/1563 [13:31<03:28, 1.60it/s] 79%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Š | 1230/1563 [13:31<03:25, 1.62it/s] 79%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‰ | 1231/1563 [13:32<03:48, 1.45it/s] 79%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‰ | 1232/1563 [13:33<03:54, 1.41it/s] 79%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‰ | 1233/1563 [13:33<03:35, 1.53it/s] 79%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‰ | 1234/1563 [13:34<03:17, 1.67it/s] 79%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‰ | 1235/1563 [13:34<02:56, 1.86it/s] 79%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‰ | 1236/1563 [13:35<03:18, 1.65it/s] 79%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‰ | 1237/1563 [13:35<03:01, 1.80it/s] 79%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‰ | 1238/1563 [13:36<03:26, 1.57it/s] 79%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‰ | 1239/1563 [13:37<03:32, 1.52it/s] 79%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‰ | 1240/1563 [13:37<03:14, 1.66it/s] 79%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‰ | 1241/1563 [13:38<03:37, 1.48it/s] 79%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‰ | 1242/1563 [13:39<03:12, 1.67it/s] 80%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‰ | 1243/1563 [13:39<03:19, 1.61it/s] 80%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‰ | 1244/1563 [13:40<03:25, 1.55it/s] 80%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‰ | 1245/1563 [13:41<03:17, 1.61it/s] 80%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‰ | 1246/1563 [13:41<02:59, 1.77it/s] 80%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‰ | 1247/1563 [13:42<03:18, 1.59it/s] 80%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‰ | 1248/1563 [13:43<03:31, 1.49it/s] 80%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‰ | 1249/1563 [13:43<03:37, 1.45it/s] 80%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‰ | 1250/1563 [13:44<03:39, 1.43it/s] {'loss': 0.1414, 'grad_norm': 0.8828125, 'learning_rate': 4.0179142674344215e-06, 'epoch': 0.8}
80%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‰ | 1250/1563 [13:44<03:39, 1.43it/s] 80%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆ | 1251/1563 [13:45<03:53, 1.34it/s] 80%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆ | 1252/1563 [13:46<04:01, 1.29it/s] 80%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆ | 1253/1563 [13:46<03:43, 1.38it/s] 80%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆ | 1254/1563 [13:47<03:20, 1.54it/s] 80%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆ | 1255/1563 [13:47<03:09, 1.62it/s] 80%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆ | 1256/1563 [13:48<03:22, 1.51it/s] 80%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆ | 1257/1563 [13:49<03:17, 1.55it/s] 80%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆ | 1258/1563 [13:49<03:18, 1.53it/s] 81%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆ | 1259/1563 [13:50<03:05, 1.64it/s] 81%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆ | 1260/1563 [13:51<03:22, 1.49it/s] 81%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆ | 1261/1563 [13:51<03:22, 1.49it/s] 81%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆ | 1262/1563 [13:52<03:11, 1.58it/s] 81%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆ | 1263/1563 [13:52<02:54, 1.72it/s] 81%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆ | 1264/1563 [13:53<03:00, 1.65it/s] 81%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆ | 1265/1563 [13:54<03:24, 1.46it/s] 81%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆ | 1266/1563 [13:55<03:25, 1.44it/s] 81%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆ | 1267/1563 [13:55<03:22, 1.46it/s] 81%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆ | 1268/1563 [13:56<03:20, 1.47it/s] 81%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆ | 1269/1563 [13:57<03:05, 1.58it/s] 81%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ– | 1270/1563 [13:57<03:22, 1.45it/s] 81%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ– | 1271/1563 [13:58<03:20, 1.46it/s] 81%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ– | 1272/1563 [13:59<03:19, 1.46it/s] 81%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ– | 1273/1563 [13:59<03:15, 1.48it/s] 82%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ– | 1274/1563 [14:00<03:31, 1.37it/s] 82%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ– | 1275/1563 [14:01<03:25, 1.40it/s] 82%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ– | 1276/1563 [14:02<03:38, 1.31it/s] 82%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ– | 1277/1563 [14:03<03:45, 1.27it/s] 82%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ– | 1278/1563 [14:03<03:16, 1.45it/s] 82%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ– | 1279/1563 [14:04<02:56, 1.61it/s] 82%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ– | 1280/1563 [14:04<03:15, 1.45it/s] 82%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ– | 1281/1563 [14:05<02:53, 1.63it/s] 82%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ– | 1282/1563 [14:06<03:11, 1.46it/s] 82%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ– | 1283/1563 [14:06<02:57, 1.58it/s] 82%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ– | 1284/1563 [14:07<03:11, 1.46it/s] 82%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ– | 1285/1563 [14:07<02:55, 1.58it/s] 82%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ– | 1286/1563 [14:08<02:39, 1.74it/s] 82%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ– | 1287/1563 [14:09<02:54, 1.58it/s] 82%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ– | 1288/1563 [14:09<02:52, 1.59it/s] 82%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ– | 1289/1563 [14:10<02:36, 1.75it/s] 83%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Ž | 1290/1563 [14:10<02:27, 1.85it/s] 83%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Ž | 1291/1563 [14:11<02:44, 1.65it/s] 83%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Ž | 1292/1563 [14:12<02:51, 1.58it/s] 83%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Ž | 1293/1563 [14:12<02:33, 1.76it/s] 83%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Ž | 1294/1563 [14:13<02:54, 1.54it/s] 83%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Ž | 1295/1563 [14:14<03:10, 1.41it/s] 83%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Ž | 1296/1563 [14:15<03:22, 1.32it/s] 83%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Ž | 1297/1563 [14:15<03:04, 1.44it/s] 83%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Ž | 1298/1563 [14:16<03:06, 1.42it/s] 83%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Ž | 1299/1563 [14:16<02:48, 1.56it/s] 83%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Ž | 1300/1563 [14:17<02:33, 1.71it/s] {'loss': 0.1409, 'grad_norm': 8.8125, 'learning_rate': 3.378119001919386e-06, 'epoch': 0.83}
83%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Ž | 1300/1563 [14:17<02:33, 1.71it/s] 83%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Ž | 1301/1563 [14:18<02:44, 1.60it/s] 83%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Ž | 1302/1563 [14:18<02:26, 1.78it/s] 83%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Ž | 1303/1563 [14:18<02:18, 1.88it/s] 83%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Ž | 1304/1563 [14:19<02:38, 1.64it/s] 83%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Ž | 1305/1563 [14:20<02:34, 1.67it/s] 84%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Ž | 1306/1563 [14:20<02:23, 1.80it/s] 84%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Ž | 1307/1563 [14:21<02:40, 1.60it/s] 84%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Ž | 1308/1563 [14:22<02:54, 1.46it/s] 84%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Ž | 1309/1563 [14:22<02:37, 1.61it/s] 84%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ– | 1310/1563 [14:23<02:43, 1.55it/s] 84%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ– | 1311/1563 [14:24<02:33, 1.64it/s] 84%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ– | 1312/1563 [14:24<02:24, 1.73it/s] 84%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ– | 1313/1563 [14:25<02:31, 1.65it/s] 84%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ– | 1314/1563 [14:25<02:20, 1.77it/s] 84%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ– | 1315/1563 [14:26<02:19, 1.78it/s] 84%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ– | 1316/1563 [14:27<02:38, 1.55it/s] 84%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ– | 1317/1563 [14:27<02:30, 1.64it/s] 84%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ– | 1318/1563 [14:28<02:19, 1.76it/s] 84%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ– | 1319/1563 [14:28<02:09, 1.88it/s] 84%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ– | 1320/1563 [14:29<02:28, 1.63it/s] 85%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ– | 1321/1563 [14:29<02:17, 1.76it/s] 85%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ– | 1322/1563 [14:30<02:22, 1.70it/s] 85%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ– | 1323/1563 [14:31<02:18, 1.73it/s] 85%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ– | 1324/1563 [14:31<02:22, 1.68it/s] 85%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ– | 1325/1563 [14:32<02:21, 1.68it/s] 85%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ– | 1326/1563 [14:33<02:39, 1.49it/s] 85%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ– | 1327/1563 [14:33<02:44, 1.44it/s] 85%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ– | 1328/1563 [14:34<02:25, 1.62it/s] 85%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Œ | 1329/1563 [14:34<02:24, 1.62it/s] 85%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Œ | 1330/1563 [14:35<02:16, 1.70it/s] 85%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Œ | 1331/1563 [14:36<02:33, 1.51it/s] 85%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Œ | 1332/1563 [14:36<02:35, 1.48it/s] 85%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Œ | 1333/1563 [14:37<02:17, 1.67it/s] 85%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Œ | 1334/1563 [14:37<02:14, 1.71it/s] 85%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Œ | 1335/1563 [14:38<02:02, 1.85it/s] 85%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Œ | 1336/1563 [14:39<02:22, 1.59it/s] 86%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Œ | 1337/1563 [14:40<02:34, 1.46it/s] 86%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Œ | 1338/1563 [14:40<02:45, 1.36it/s] 86%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Œ | 1339/1563 [14:41<02:49, 1.32it/s] 86%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Œ | 1340/1563 [14:42<02:55, 1.27it/s] 86%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Œ | 1341/1563 [14:43<02:47, 1.32it/s] 86%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Œ | 1342/1563 [14:43<02:44, 1.34it/s] 86%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Œ | 1343/1563 [14:44<02:39, 1.38it/s] 86%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Œ | 1344/1563 [14:45<02:46, 1.32it/s] 86%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Œ | 1345/1563 [14:46<02:51, 1.27it/s] 86%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Œ | 1346/1563 [14:46<02:36, 1.39it/s] 86%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Œ | 1347/1563 [14:47<02:44, 1.31it/s] 86%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Œ | 1348/1563 [14:48<02:29, 1.44it/s] 86%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‹ | 1349/1563 [14:48<02:19, 1.53it/s] 86%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‹ | 1350/1563 [14:49<02:11, 1.62it/s] {'loss': 0.1395, 'grad_norm': 12.1875, 'learning_rate': 2.738323736404351e-06, 'epoch': 0.86}
86%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‹ | 1350/1563 [14:49<02:11, 1.62it/s] 86%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‹ | 1351/1563 [14:49<02:09, 1.63it/s] 87%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‹ | 1352/1563 [14:50<02:25, 1.45it/s] 87%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‹ | 1353/1563 [14:51<02:32, 1.37it/s] 87%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‹ | 1354/1563 [14:52<02:19, 1.50it/s] 87%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‹ | 1355/1563 [14:52<02:24, 1.44it/s] 87%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‹ | 1356/1563 [14:53<02:34, 1.34it/s] 87%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‹ | 1357/1563 [14:54<02:13, 1.54it/s] 87%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‹ | 1358/1563 [14:54<02:18, 1.49it/s] 87%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‹ | 1359/1563 [14:55<02:28, 1.37it/s] 87%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‹ | 1360/1563 [14:56<02:25, 1.40it/s] 87%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‹ | 1361/1563 [14:57<02:27, 1.37it/s] 87%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‹ | 1362/1563 [14:57<02:25, 1.38it/s] 87%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‹ | 1363/1563 [14:58<02:31, 1.32it/s] 87%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‹ | 1364/1563 [14:59<02:32, 1.30it/s] 87%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‹ | 1365/1563 [15:00<02:19, 1.42it/s] 87%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‹ | 1366/1563 [15:01<02:26, 1.34it/s] 87%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‹ | 1367/1563 [15:01<02:14, 1.46it/s] 88%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Š | 1368/1563 [15:02<02:23, 1.36it/s] 88%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Š | 1369/1563 [15:03<02:30, 1.29it/s] 88%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Š | 1370/1563 [15:03<02:20, 1.37it/s] 88%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Š | 1371/1563 [15:04<02:04, 1.54it/s] 88%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Š | 1372/1563 [15:05<02:06, 1.51it/s] 88%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Š | 1373/1563 [15:05<02:14, 1.42it/s] 88%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Š | 1374/1563 [15:06<02:00, 1.57it/s] 88%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Š | 1375/1563 [15:06<01:49, 1.72it/s] 88%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Š | 1376/1563 [15:07<01:47, 1.75it/s] 88%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Š | 1377/1563 [15:07<01:48, 1.72it/s] 88%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Š | 1378/1563 [15:08<01:38, 1.88it/s] 88%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Š | 1379/1563 [15:08<01:38, 1.87it/s] 88%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Š | 1380/1563 [15:09<01:34, 1.94it/s] 88%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Š | 1381/1563 [15:09<01:29, 2.04it/s] 88%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Š | 1382/1563 [15:10<01:27, 2.07it/s] 88%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Š | 1383/1563 [15:10<01:32, 1.95it/s] 89%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Š | 1384/1563 [15:11<01:28, 2.03it/s] 89%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Š | 1385/1563 [15:11<01:29, 1.99it/s] 89%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Š | 1386/1563 [15:12<01:44, 1.70it/s] 89%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Š | 1387/1563 [15:13<01:36, 1.83it/s] 89%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‰ | 1388/1563 [15:13<01:50, 1.58it/s] 89%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‰ | 1389/1563 [15:14<01:42, 1.70it/s] 89%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‰ | 1390/1563 [15:14<01:38, 1.75it/s] 89%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‰ | 1391/1563 [15:15<01:34, 1.82it/s] 89%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‰ | 1392/1563 [15:15<01:31, 1.86it/s] 89%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‰ | 1393/1563 [15:16<01:48, 1.57it/s] 89%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‰ | 1394/1563 [15:17<01:51, 1.52it/s] 89%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‰ | 1395/1563 [15:18<01:53, 1.48it/s] 89%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‰ | 1396/1563 [15:19<02:01, 1.37it/s] 89%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‰ | 1397/1563 [15:19<02:02, 1.35it/s] 89%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‰ | 1398/1563 [15:20<02:08, 1.28it/s] 90%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‰ | 1399/1563 [15:21<01:52, 1.46it/s] 90%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‰ | 1400/1563 [15:21<01:40, 1.62it/s] {'loss': 0.1385, 'grad_norm': 0.73828125, 'learning_rate': 2.0985284708893156e-06, 'epoch': 0.9}
90%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‰ | 1400/1563 [15:21<01:40, 1.62it/s] 90%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‰ | 1401/1563 [15:22<01:33, 1.73it/s] 90%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‰ | 1402/1563 [15:22<01:26, 1.86it/s] 90%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‰ | 1403/1563 [15:23<01:22, 1.94it/s] 90%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‰ | 1404/1563 [15:23<01:26, 1.85it/s] 90%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‰ | 1405/1563 [15:24<01:40, 1.58it/s] 90%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‰ | 1406/1563 [15:25<01:49, 1.43it/s] 90%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆ | 1407/1563 [15:26<01:48, 1.44it/s] 90%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆ | 1408/1563 [15:26<01:36, 1.60it/s] 90%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆ | 1409/1563 [15:26<01:30, 1.70it/s] 90%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆ | 1410/1563 [15:27<01:42, 1.49it/s] 90%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆ | 1411/1563 [15:28<01:42, 1.49it/s] 90%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆ | 1412/1563 [15:28<01:30, 1.68it/s] 90%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆ | 1413/1563 [15:29<01:22, 1.82it/s] 90%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆ | 1414/1563 [15:29<01:18, 1.91it/s] 91%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆ | 1415/1563 [15:30<01:25, 1.72it/s] 91%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆ | 1416/1563 [15:31<01:37, 1.51it/s] 91%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆ | 1417/1563 [15:32<01:36, 1.52it/s] 91%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆ | 1418/1563 [15:32<01:35, 1.52it/s] 91%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆ | 1419/1563 [15:33<01:41, 1.42it/s] 91%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆ | 1420/1563 [15:34<01:40, 1.42it/s] 91%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆ | 1421/1563 [15:34<01:35, 1.49it/s] 91%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆ | 1422/1563 [15:35<01:33, 1.51it/s] 91%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆ | 1423/1563 [15:36<01:35, 1.47it/s] 91%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆ | 1424/1563 [15:36<01:30, 1.53it/s] 91%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆ | 1425/1563 [15:37<01:29, 1.55it/s] 91%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆ | 1426/1563 [15:37<01:23, 1.63it/s] 91%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–| 1427/1563 [15:38<01:32, 1.47it/s] 91%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–| 1428/1563 [15:39<01:30, 1.48it/s] 91%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–| 1429/1563 [15:40<01:35, 1.40it/s] 91%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–| 1430/1563 [15:40<01:31, 1.45it/s] 92%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–| 1431/1563 [15:41<01:28, 1.49it/s] 92%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–| 1432/1563 [15:42<01:22, 1.58it/s] 92%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–| 1433/1563 [15:42<01:26, 1.50it/s] 92%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–| 1434/1563 [15:43<01:33, 1.39it/s] 92%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–| 1435/1563 [15:44<01:38, 1.30it/s] 92%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–| 1436/1563 [15:45<01:39, 1.27it/s] 92%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–| 1437/1563 [15:46<01:42, 1.23it/s] 92%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–| 1438/1563 [15:47<01:41, 1.23it/s] 92%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–| 1439/1563 [15:47<01:35, 1.30it/s] 92%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–| 1440/1563 [15:48<01:35, 1.29it/s] 92%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–| 1441/1563 [15:49<01:37, 1.25it/s] 92%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–| 1442/1563 [15:49<01:25, 1.41it/s] 92%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–| 1443/1563 [15:50<01:30, 1.33it/s] 92%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–| 1444/1563 [15:51<01:21, 1.46it/s] 92%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–| 1445/1563 [15:51<01:16, 1.55it/s] 93%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Ž| 1446/1563 [15:52<01:14, 1.56it/s] 93%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Ž| 1447/1563 [15:52<01:12, 1.59it/s] 93%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Ž| 1448/1563 [15:53<01:12, 1.59it/s] 93%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Ž| 1449/1563 [15:54<01:06, 1.72it/s] 93%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Ž| 1450/1563 [15:54<01:02, 1.80it/s] {'loss': 0.1388, 'grad_norm': 14.0625, 'learning_rate': 1.4587332053742803e-06, 'epoch': 0.93}
93%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Ž| 1450/1563 [15:54<01:02, 1.80it/s] 93%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Ž| 1451/1563 [15:55<01:05, 1.70it/s] 93%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Ž| 1452/1563 [15:55<01:01, 1.79it/s] 93%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Ž| 1453/1563 [15:56<01:07, 1.63it/s] 93%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Ž| 1454/1563 [15:56<01:01, 1.76it/s] 93%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Ž| 1455/1563 [15:57<01:04, 1.67it/s] 93%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Ž| 1456/1563 [15:58<01:03, 1.69it/s] 93%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Ž| 1457/1563 [15:58<01:05, 1.62it/s] 93%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Ž| 1458/1563 [15:59<00:59, 1.77it/s] 93%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Ž| 1459/1563 [15:59<00:54, 1.92it/s] 93%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Ž| 1460/1563 [16:00<00:58, 1.76it/s] 93%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Ž| 1461/1563 [16:01<01:04, 1.57it/s] 94%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Ž| 1462/1563 [16:01<01:07, 1.49it/s] 94%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Ž| 1463/1563 [16:02<01:12, 1.38it/s] 94%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Ž| 1464/1563 [16:03<01:04, 1.54it/s] 94%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Ž| 1465/1563 [16:03<01:03, 1.55it/s] 94%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–| 1466/1563 [16:04<01:09, 1.40it/s] 94%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–| 1467/1563 [16:05<01:09, 1.38it/s] 94%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–| 1468/1563 [16:05<01:00, 1.56it/s] 94%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–| 1469/1563 [16:06<00:54, 1.72it/s] 94%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–| 1470/1563 [16:07<01:01, 1.52it/s] 94%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–| 1471/1563 [16:07<01:00, 1.53it/s] 94%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–| 1472/1563 [16:08<00:55, 1.64it/s] 94%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–| 1473/1563 [16:08<00:52, 1.73it/s] 94%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–| 1474/1563 [16:09<00:57, 1.54it/s] 94%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–| 1475/1563 [16:10<00:52, 1.69it/s] 94%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–| 1476/1563 [16:10<00:56, 1.53it/s] 94%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–| 1477/1563 [16:11<00:53, 1.61it/s] 95%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–| 1478/1563 [16:12<00:50, 1.68it/s] 95%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–| 1479/1563 [16:12<00:47, 1.76it/s] 95%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–| 1480/1563 [16:13<00:50, 1.64it/s] 95%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–| 1481/1563 [16:14<00:54, 1.50it/s] 95%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–| 1482/1563 [16:14<00:57, 1.40it/s] 95%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–| 1483/1563 [16:15<00:56, 1.42it/s] 95%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–| 1484/1563 [16:16<00:55, 1.43it/s] 95%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Œ| 1485/1563 [16:17<00:58, 1.33it/s] 95%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Œ| 1486/1563 [16:17<00:59, 1.29it/s] 95%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Œ| 1487/1563 [16:18<00:58, 1.29it/s] 95%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Œ| 1488/1563 [16:19<00:56, 1.33it/s] 95%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Œ| 1489/1563 [16:20<00:54, 1.36it/s] 95%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Œ| 1490/1563 [16:20<00:55, 1.31it/s] 95%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Œ| 1491/1563 [16:21<00:48, 1.49it/s] 95%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Œ| 1492/1563 [16:21<00:44, 1.59it/s] 96%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Œ| 1493/1563 [16:22<00:41, 1.69it/s] 96%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Œ| 1494/1563 [16:23<00:39, 1.74it/s] 96%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Œ| 1495/1563 [16:23<00:39, 1.73it/s] 96%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Œ| 1496/1563 [16:24<00:41, 1.60it/s] 96%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Œ| 1497/1563 [16:25<00:43, 1.53it/s] 96%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Œ| 1498/1563 [16:25<00:44, 1.46it/s] 96%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Œ| 1499/1563 [16:26<00:39, 1.60it/s] 96%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Œ| 1500/1563 [16:26<00:40, 1.57it/s] {'loss': 0.1387, 'grad_norm': 1.71875, 'learning_rate': 8.18937939859245e-07, 'epoch': 0.96}
96%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Œ| 1500/1563 [16:27<00:40, 1.57it/s] 96%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Œ| 1501/1563 [16:27<00:43, 1.41it/s] 96%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Œ| 1502/1563 [16:28<00:39, 1.54it/s] 96%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Œ| 1503/1563 [16:29<00:40, 1.49it/s] 96%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Œ| 1504/1563 [16:29<00:40, 1.44it/s] 96%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‹| 1505/1563 [16:30<00:39, 1.47it/s] 96%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‹| 1506/1563 [16:31<00:41, 1.39it/s] 96%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‹| 1507/1563 [16:31<00:36, 1.54it/s] 96%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‹| 1508/1563 [16:32<00:39, 1.40it/s] 97%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‹| 1509/1563 [16:33<00:40, 1.33it/s] 97%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‹| 1510/1563 [16:33<00:35, 1.50it/s] 97%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‹| 1511/1563 [16:34<00:36, 1.41it/s] 97%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‹| 1512/1563 [16:35<00:33, 1.53it/s] 97%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‹| 1513/1563 [16:35<00:29, 1.68it/s] 97%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‹| 1514/1563 [16:36<00:27, 1.77it/s] 97%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‹| 1515/1563 [16:36<00:28, 1.69it/s] 97%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‹| 1516/1563 [16:37<00:26, 1.81it/s] 97%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‹| 1517/1563 [16:38<00:29, 1.56it/s] 97%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‹| 1518/1563 [16:39<00:31, 1.43it/s] 97%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‹| 1519/1563 [16:39<00:27, 1.61it/s] 97%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‹| 1520/1563 [16:40<00:29, 1.44it/s] 97%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‹| 1521/1563 [16:40<00:26, 1.61it/s] 97%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‹| 1522/1563 [16:41<00:28, 1.44it/s] 97%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‹| 1523/1563 [16:42<00:28, 1.40it/s] 98%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Š| 1524/1563 [16:43<00:29, 1.33it/s] 98%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Š| 1525/1563 [16:43<00:24, 1.52it/s] 98%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Š| 1526/1563 [16:44<00:21, 1.70it/s] 98%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Š| 1527/1563 [16:44<00:21, 1.71it/s] 98%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Š| 1528/1563 [16:45<00:22, 1.56it/s] 98%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Š| 1529/1563 [16:45<00:20, 1.68it/s] 98%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Š| 1530/1563 [16:46<00:19, 1.67it/s] 98%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Š| 1531/1563 [16:47<00:18, 1.77it/s] 98%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Š| 1532/1563 [16:47<00:18, 1.68it/s] 98%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Š| 1533/1563 [16:48<00:18, 1.58it/s] 98%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Š| 1534/1563 [16:49<00:20, 1.44it/s] 98%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Š| 1535/1563 [16:49<00:17, 1.59it/s] 98%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Š| 1536/1563 [16:50<00:18, 1.48it/s] 98%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Š| 1537/1563 [16:51<00:20, 1.30it/s] 98%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Š| 1538/1563 [16:52<00:19, 1.31it/s] 98%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Š| 1539/1563 [16:52<00:17, 1.40it/s] 99%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Š| 1540/1563 [16:53<00:15, 1.44it/s] 99%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Š| 1541/1563 [16:54<00:14, 1.49it/s] 99%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Š| 1542/1563 [16:54<00:12, 1.65it/s] 99%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Š| 1543/1563 [16:55<00:11, 1.75it/s] 99%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‰| 1544/1563 [16:55<00:10, 1.88it/s] 99%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‰| 1545/1563 [16:56<00:10, 1.72it/s] 99%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‰| 1546/1563 [16:56<00:10, 1.57it/s] 99%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‰| 1547/1563 [16:57<00:10, 1.55it/s] 99%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‰| 1548/1563 [16:58<00:08, 1.67it/s] 99%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‰| 1549/1563 [16:58<00:07, 1.76it/s] 99%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‰| 1550/1563 [16:59<00:08, 1.53it/s] {'loss': 0.1407, 'grad_norm': 2.4375, 'learning_rate': 1.7914267434420988e-07, 'epoch': 0.99}
99%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‰| 1550/1563 [16:59<00:08, 1.53it/s] 99%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‰| 1551/1563 [16:59<00:07, 1.69it/s] 99%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‰| 1552/1563 [17:00<00:07, 1.49it/s] 99%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‰| 1553/1563 [17:01<00:06, 1.48it/s] 99%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‰| 1554/1563 [17:02<00:06, 1.37it/s] 99%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‰| 1555/1563 [17:02<00:05, 1.45it/s] 100%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‰| 1556/1563 [17:03<00:04, 1.53it/s] 100%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‰| 1557/1563 [17:03<00:03, 1.66it/s] 100%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‰| 1558/1563 [17:04<00:03, 1.47it/s] 100%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‰| 1559/1563 [17:05<00:02, 1.47it/s] 100%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‰| 1560/1563 [17:06<00:02, 1.37it/s] 100%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‰| 1561/1563 [17:07<00:01, 1.36it/s] 100%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‰| 1562/1563 [17:07<00:00, 1.50it/s] 100%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆ| 1563/1563 [17:08<00:00, 1.51it/s] {'train_runtime': 1033.8148, 'train_samples_per_second': 193.458, 'train_steps_per_second': 1.512, 'train_loss': 0.2836286289449388, 'epoch': 1.0}
100%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆ| 1563/1563 [17:12<00:00, 1.51it/s] 100%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆ| 1563/1563 [17:12<00:00, 1.51it/s]
model.safetensors: 0%| | 0.00/2.00G [00:00<?, ?B/s]
tokenizer.model: 0%| | 0.00/4.69M [00:00<?, ?B/s]
training_args.bin: 0%| | 0.00/5.43k [00:00<?, ?B/s]
Upload 3 LFS files: 0%| | 0/3 [00:00<?, ?it/s]
tokenizer.model: 0%| | 16.4k/4.69M [00:00<00:31, 147kB/s] training_args.bin: 100%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆ| 5.43k/5.43k [00:00<00:00, 41.9kB/s]
model.safetensors: 0%| | 2.59M/2.00G [00:00<01:57, 17.0MB/s] tokenizer.model: 100%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆ| 4.69M/4.69M [00:00<00:00, 12.1MB/s]
model.safetensors: 1%| | 16.0M/2.00G [00:00<01:07, 29.5MB/s] model.safetensors: 2%|▏ | 32.0M/2.00G [00:00<00:45, 43.2MB/s] model.safetensors: 2%|▏ | 48.0M/2.00G [00:01<00:38, 51.1MB/s] model.safetensors: 3%|β–Ž | 64.0M/2.00G [00:01<00:33, 58.4MB/s] model.safetensors: 4%|▍ | 80.0M/2.00G [00:01<00:32, 59.3MB/s] model.safetensors: 5%|▍ | 96.0M/2.00G [00:01<00:37, 51.0MB/s] model.safetensors: 6%|β–Œ | 112M/2.00G [00:02<00:34, 54.8MB/s] model.safetensors: 6%|β–‹ | 128M/2.00G [00:02<00:32, 58.1MB/s] model.safetensors: 7%|β–‹ | 144M/2.00G [00:02<00:31, 59.2MB/s] model.safetensors: 8%|β–Š | 160M/2.00G [00:02<00:30, 61.0MB/s] model.safetensors: 9%|β–‰ | 176M/2.00G [00:03<00:30, 60.5MB/s] model.safetensors: 10%|β–‰ | 192M/2.00G [00:03<00:30, 59.4MB/s] model.safetensors: 10%|β–ˆ | 208M/2.00G [00:03<00:28, 62.6MB/s] model.safetensors: 11%|β–ˆ | 224M/2.00G [00:03<00:28, 61.8MB/s] model.safetensors: 12%|β–ˆβ– | 240M/2.00G [00:04<00:27, 63.4MB/s] model.safetensors: 13%|β–ˆβ–Ž | 256M/2.00G [00:04<00:27, 63.9MB/s] model.safetensors: 14%|β–ˆβ–Ž | 272M/2.00G [00:04<00:28, 59.9MB/s] model.safetensors: 14%|β–ˆβ– | 288M/2.00G [00:05<00:28, 59.3MB/s] model.safetensors: 15%|β–ˆβ–Œ | 304M/2.00G [00:05<00:30, 55.6MB/s] model.safetensors: 16%|β–ˆβ–Œ | 320M/2.00G [00:05<00:30, 55.2MB/s] model.safetensors: 17%|β–ˆβ–‹ | 336M/2.00G [00:05<00:28, 58.1MB/s] model.safetensors: 18%|β–ˆβ–Š | 352M/2.00G [00:06<00:27, 59.5MB/s] model.safetensors: 18%|β–ˆβ–Š | 368M/2.00G [00:06<00:26, 61.6MB/s] model.safetensors: 19%|β–ˆβ–‰ | 384M/2.00G [00:06<00:26, 61.9MB/s] model.safetensors: 20%|β–ˆβ–ˆ | 400M/2.00G [00:06<00:26, 60.8MB/s] model.safetensors: 21%|β–ˆβ–ˆ | 416M/2.00G [00:07<00:26, 59.2MB/s] model.safetensors: 22%|β–ˆβ–ˆβ– | 432M/2.00G [00:07<00:25, 61.6MB/s] model.safetensors: 22%|β–ˆβ–ˆβ– | 448M/2.00G [00:07<00:24, 62.3MB/s] model.safetensors: 23%|β–ˆβ–ˆβ–Ž | 464M/2.00G [00:07<00:22, 67.0MB/s] model.safetensors: 24%|β–ˆβ–ˆβ– | 480M/2.00G [00:08<00:23, 63.9MB/s] model.safetensors: 25%|β–ˆβ–ˆβ– | 496M/2.00G [00:08<00:23, 63.5MB/s] model.safetensors: 26%|β–ˆβ–ˆβ–Œ | 512M/2.00G [00:08<00:30, 48.7MB/s] model.safetensors: 26%|β–ˆβ–ˆβ–‹ | 528M/2.00G [00:09<00:28, 51.2MB/s] model.safetensors: 27%|β–ˆβ–ˆβ–‹ | 544M/2.00G [00:09<00:27, 52.4MB/s] model.safetensors: 28%|β–ˆβ–ˆβ–Š | 560M/2.00G [00:09<00:25, 55.5MB/s] model.safetensors: 29%|β–ˆβ–ˆβ–‰ | 576M/2.00G [00:10<00:25, 56.4MB/s] model.safetensors: 30%|β–ˆβ–ˆβ–‰ | 592M/2.00G [00:10<00:24, 57.8MB/s] model.safetensors: 30%|β–ˆβ–ˆβ–ˆ | 608M/2.00G [00:10<00:22, 60.9MB/s] model.safetensors: 31%|β–ˆβ–ˆβ–ˆ | 624M/2.00G [00:10<00:23, 59.7MB/s] model.safetensors: 32%|β–ˆβ–ˆβ–ˆβ– | 640M/2.00G [00:11<00:21, 63.3MB/s] model.safetensors: 33%|β–ˆβ–ˆβ–ˆβ–Ž | 656M/2.00G [00:11<00:21, 63.3MB/s] model.safetensors: 34%|β–ˆβ–ˆβ–ˆβ–Ž | 672M/2.00G [00:11<00:21, 61.4MB/s] model.safetensors: 34%|β–ˆβ–ˆβ–ˆβ– | 688M/2.00G [00:11<00:20, 63.6MB/s] model.safetensors: 35%|β–ˆβ–ˆβ–ˆβ–Œ | 704M/2.00G [00:12<00:20, 63.2MB/s] model.safetensors: 36%|β–ˆβ–ˆβ–ˆβ–Œ | 720M/2.00G [00:12<00:20, 63.7MB/s] model.safetensors: 37%|β–ˆβ–ˆβ–ˆβ–‹ | 736M/2.00G [00:12<00:19, 64.2MB/s] model.safetensors: 38%|β–ˆβ–ˆβ–ˆβ–Š | 752M/2.00G [00:12<00:19, 65.5MB/s] model.safetensors: 38%|β–ˆβ–ˆβ–ˆβ–Š | 768M/2.00G [00:13<00:19, 62.7MB/s] model.safetensors: 39%|β–ˆβ–ˆβ–ˆβ–‰ | 784M/2.00G [00:13<00:18, 64.2MB/s] model.safetensors: 40%|β–ˆβ–ˆβ–ˆβ–ˆ | 800M/2.00G [00:13<00:18, 64.1MB/s] model.safetensors: 41%|β–ˆβ–ˆβ–ˆβ–ˆ | 816M/2.00G [00:13<00:19, 60.3MB/s] model.safetensors: 42%|β–ˆβ–ˆβ–ˆβ–ˆβ– | 832M/2.00G [00:14<00:19, 60.3MB/s] model.safetensors: 42%|β–ˆβ–ˆβ–ˆβ–ˆβ– | 848M/2.00G [00:14<00:19, 59.6MB/s] model.safetensors: 43%|β–ˆβ–ˆβ–ˆβ–ˆβ–Ž | 864M/2.00G [00:14<00:18, 61.2MB/s] model.safetensors: 44%|β–ˆβ–ˆβ–ˆβ–ˆβ– | 880M/2.00G [00:14<00:18, 62.0MB/s] model.safetensors: 45%|β–ˆβ–ˆβ–ˆβ–ˆβ– | 896M/2.00G [00:15<00:18, 60.1MB/s] model.safetensors: 46%|β–ˆβ–ˆβ–ˆβ–ˆβ–Œ | 912M/2.00G [00:15<00:19, 55.9MB/s] model.safetensors: 46%|β–ˆβ–ˆβ–ˆβ–ˆβ–‹ | 928M/2.00G [00:15<00:17, 61.9MB/s] model.safetensors: 47%|β–ˆβ–ˆβ–ˆβ–ˆβ–‹ | 944M/2.00G [00:15<00:16, 64.2MB/s] model.safetensors: 48%|β–ˆβ–ˆβ–ˆβ–ˆβ–Š | 960M/2.00G [00:16<00:15, 65.8MB/s] model.safetensors: 49%|β–ˆβ–ˆβ–ˆβ–ˆβ–‰ | 976M/2.00G [00:16<00:17, 59.3MB/s] model.safetensors: 50%|β–ˆβ–ˆβ–ˆβ–ˆβ–‰ | 992M/2.00G [00:16<00:17, 56.2MB/s] model.safetensors: 50%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆ | 1.01G/2.00G [00:17<00:17, 56.3MB/s] model.safetensors: 51%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆ | 1.02G/2.00G [00:17<00:16, 59.3MB/s] model.safetensors: 52%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ– | 1.04G/2.00G [00:17<00:15, 62.5MB/s] model.safetensors: 53%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Ž | 1.06G/2.00G [00:17<00:15, 60.8MB/s] model.safetensors: 54%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Ž | 1.07G/2.00G [00:18<00:16, 57.2MB/s] model.safetensors: 54%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ– | 1.09G/2.00G [00:18<00:14, 61.8MB/s] model.safetensors: 55%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Œ | 1.10G/2.00G [00:18<00:13, 64.4MB/s] model.safetensors: 56%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Œ | 1.12G/2.00G [00:18<00:14, 62.0MB/s] model.safetensors: 57%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‹ | 1.14G/2.00G [00:19<00:14, 60.2MB/s] model.safetensors: 58%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Š | 1.15G/2.00G [00:19<00:13, 62.4MB/s] model.safetensors: 58%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Š | 1.17G/2.00G [00:19<00:14, 59.0MB/s] model.safetensors: 59%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‰ | 1.18G/2.00G [00:19<00:13, 60.8MB/s] model.safetensors: 60%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆ | 1.20G/2.00G [00:20<00:13, 58.5MB/s] model.safetensors: 61%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆ | 1.22G/2.00G [00:20<00:13, 59.7MB/s] model.safetensors: 62%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ– | 1.23G/2.00G [00:20<00:12, 60.3MB/s] model.safetensors: 62%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ– | 1.25G/2.00G [00:20<00:12, 59.7MB/s] model.safetensors: 63%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Ž | 1.26G/2.00G [00:21<00:12, 61.1MB/s] model.safetensors: 64%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ– | 1.28G/2.00G [00:21<00:12, 58.0MB/s] model.safetensors: 65%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ– | 1.30G/2.00G [00:21<00:12, 58.4MB/s] model.safetensors: 66%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Œ | 1.31G/2.00G [00:22<00:12, 53.0MB/s] model.safetensors: 66%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‹ | 1.33G/2.00G [00:22<00:12, 55.1MB/s] model.safetensors: 67%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‹ | 1.34G/2.00G [00:22<00:11, 54.7MB/s] model.safetensors: 68%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Š | 1.36G/2.00G [00:22<00:10, 61.2MB/s] model.safetensors: 69%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‰ | 1.38G/2.00G [00:23<00:10, 60.3MB/s] model.safetensors: 70%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‰ | 1.39G/2.00G [00:23<00:11, 53.5MB/s] model.safetensors: 70%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆ | 1.41G/2.00G [00:23<00:10, 55.6MB/s] model.safetensors: 71%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆ | 1.42G/2.00G [00:24<00:09, 61.4MB/s] model.safetensors: 72%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ– | 1.44G/2.00G [00:24<00:08, 66.8MB/s] model.safetensors: 73%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Ž | 1.46G/2.00G [00:24<00:08, 65.9MB/s] model.safetensors: 74%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Ž | 1.47G/2.00G [00:24<00:08, 61.6MB/s] model.safetensors: 74%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ– | 1.49G/2.00G [00:25<00:08, 61.6MB/s] model.safetensors: 75%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Œ | 1.50G/2.00G [00:25<00:08, 60.4MB/s] model.safetensors: 76%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Œ | 1.52G/2.00G [00:25<00:07, 66.0MB/s] model.safetensors: 77%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‹ | 1.54G/2.00G [00:25<00:07, 63.2MB/s] model.safetensors: 78%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Š | 1.55G/2.00G [00:26<00:07, 61.7MB/s] model.safetensors: 78%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Š | 1.57G/2.00G [00:26<00:06, 63.3MB/s] model.safetensors: 79%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‰ | 1.58G/2.00G [00:26<00:06, 64.1MB/s] model.safetensors: 80%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆ | 1.60G/2.00G [00:26<00:06, 63.7MB/s] model.safetensors: 81%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆ | 1.62G/2.00G [00:27<00:05, 66.3MB/s] model.safetensors: 82%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ– | 1.63G/2.00G [00:27<00:05, 68.0MB/s] model.safetensors: 82%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ– | 1.65G/2.00G [00:27<00:05, 63.4MB/s] model.safetensors: 83%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Ž | 1.66G/2.00G [00:27<00:05, 60.8MB/s] model.safetensors: 84%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ– | 1.68G/2.00G [00:28<00:05, 61.9MB/s] model.safetensors: 85%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ– | 1.70G/2.00G [00:28<00:04, 62.6MB/s] model.safetensors: 86%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Œ | 1.71G/2.00G [00:28<00:05, 55.2MB/s] model.safetensors: 86%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‹ | 1.73G/2.00G [00:29<00:07, 34.8MB/s] model.safetensors: 87%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‹ | 1.74G/2.00G [00:29<00:06, 41.9MB/s] model.safetensors: 88%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Š | 1.76G/2.00G [00:30<00:05, 45.0MB/s] model.safetensors: 89%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‰ | 1.78G/2.00G [00:30<00:04, 47.8MB/s] model.safetensors: 90%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‰ | 1.79G/2.00G [00:30<00:04, 51.1MB/s] model.safetensors: 90%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆ | 1.81G/2.00G [00:30<00:03, 52.9MB/s] model.safetensors: 91%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆ | 1.82G/2.00G [00:31<00:03, 56.7MB/s] model.safetensors: 92%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–| 1.84G/2.00G [00:31<00:02, 60.8MB/s] model.safetensors: 93%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Ž| 1.86G/2.00G [00:31<00:02, 60.5MB/s] model.safetensors: 94%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Ž| 1.87G/2.00G [00:31<00:02, 61.4MB/s] model.safetensors: 94%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–| 1.89G/2.00G [00:32<00:01, 61.1MB/s] model.safetensors: 95%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Œ| 1.90G/2.00G [00:32<00:01, 61.6MB/s] model.safetensors: 96%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Œ| 1.92G/2.00G [00:32<00:01, 64.1MB/s] model.safetensors: 97%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‹| 1.94G/2.00G [00:32<00:00, 68.6MB/s] model.safetensors: 98%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Š| 1.95G/2.00G [00:33<00:00, 66.4MB/s] model.safetensors: 98%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Š| 1.97G/2.00G [00:33<00:00, 67.9MB/s] model.safetensors: 99%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‰| 1.98G/2.00G [00:33<00:00, 66.5MB/s] model.safetensors: 100%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆ| 2.00G/2.00G [00:33<00:00, 59.2MB/s]
Upload 3 LFS files: 33%|β–ˆβ–ˆβ–ˆβ–Ž | 1/3 [00:34<01:08, 34.03s/it] Upload 3 LFS files: 100%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆ| 3/3 [00:34<00:00, 11.34s/it]