File size: 125,092 Bytes
8c97474 |
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 |
[2025-05-14 21:43:37] Created output directory: train_results_pred_mask/google_gemma-3-1b-pt_qa_ds100_upsample1000
[2025-05-14 21:43:37] Chat mode disabled
[2025-05-14 21:43:37] Model size is 3B or smaller (1 B). Using full fine-tuning.
[2025-05-14 21:43:37] No QA format data will be used
[2025-05-14 21:43:37] Limiting dataset size to: 100 samples
[2025-05-14 21:43:37] =======================================
[2025-05-14 21:43:37] Starting training for model: google/gemma-3-1b-pt
[2025-05-14 21:43:37] =======================================
[2025-05-14 21:43:37] CUDA_VISIBLE_DEVICES: 0,1,2,3
[2025-05-14 21:43:37] WANDB_PROJECT: wikidyk-ar
[2025-05-14 21:43:37] DATA_PATH: data/wikidyk2022-2025_01082025_gpt-4o_evalv2_pages_formatted_combined_v2_trainqas.json
[2025-05-14 21:43:37] Global Batch Size: 128
[2025-05-14 21:43:37] Data Size: 100
[2025-05-14 21:43:37] Executing command: torchrun --nproc_per_node "4" --master-port 29581 src/train.py --model_name_or_path "google/gemma-3-1b-pt" --data_path "data/wikidyk2022-2025_01082025_gpt-4o_evalv2_pages_formatted_combined_v2_trainqas.json" --output_dir "train_results_pred_mask/google_gemma-3-1b-pt_qa_ds100_upsample1000" --num_upsample "1000" --per_device_train_batch_size "32" --gradient_accumulation_steps "1" --learning_rate "2e-5" --num_train_epochs "1" --model_max_length "32768" --report_to wandb --logging_steps 50 --save_strategy steps --save_steps 10000 --save_total_limit 3 --resume_from_checkpoint True --bf16 True --use_flash_attention_2 True --qa_data_ratio "-1" --predict_mask "false" --ds_size 100
[2025-05-14 21:43:37] Training started at Wed May 14 21:43:37 UTC 2025
W0514 21:43:38.845000 618618 site-packages/torch/distributed/run.py:792]
W0514 21:43:38.845000 618618 site-packages/torch/distributed/run.py:792] *****************************************
W0514 21:43:38.845000 618618 site-packages/torch/distributed/run.py:792] Setting OMP_NUM_THREADS environment variable for each process to be 1 in default, to avoid your system being overloaded, please further tune the variable for optimal performance in your application as needed.
W0514 21:43:38.845000 618618 site-packages/torch/distributed/run.py:792] *****************************************
WARNING:root:Output directory: train_results_pred_mask/google_gemma-3-1b-pt_qa_ds100_upsample1000
The model was loaded with use_flash_attention_2=True, which is deprecated and may be removed in a future release. Please use `attn_implementation="flash_attention_2"` instead.
You are attempting to use Flash Attention 2.0 with a model not initialized on GPU. Make sure to move the model to GPU after initializing it on CPU with `model.to('cuda')`.
WARNING:root:Output directory: train_results_pred_mask/google_gemma-3-1b-pt_qa_ds100_upsample1000
The model was loaded with use_flash_attention_2=True, which is deprecated and may be removed in a future release. Please use `attn_implementation="flash_attention_2"` instead.
You are attempting to use Flash Attention 2.0 with a model not initialized on GPU. Make sure to move the model to GPU after initializing it on CPU with `model.to('cuda')`.
WARNING:root:Output directory: train_results_pred_mask/google_gemma-3-1b-pt_qa_ds100_upsample1000
WARNING:root:Output directory: train_results_pred_mask/google_gemma-3-1b-pt_qa_ds100_upsample1000
The model was loaded with use_flash_attention_2=True, which is deprecated and may be removed in a future release. Please use `attn_implementation="flash_attention_2"` instead.
You are attempting to use Flash Attention 2.0 with a model not initialized on GPU. Make sure to move the model to GPU after initializing it on CPU with `model.to('cuda')`.
WARNING:root:Loading data...
The model was loaded with use_flash_attention_2=True, which is deprecated and may be removed in a future release. Please use `attn_implementation="flash_attention_2"` instead.
You are attempting to use Flash Attention 2.0 with a model not initialized on GPU. Make sure to move the model to GPU after initializing it on CPU with `model.to('cuda')`.
WARNING:root:Loading data...
WARNING:root:Loading data...
WARNING:root:Loading data...
WARNING:root:Dataset initialized with all QA data:
WARNING:root: - 100000 QA examples
WARNING:root: - 100 fact examples with upsampling factor 1000
WARNING:root: - Total examples: 200000
/root/yuwei/WikiDYKEvalV2/src/train.py:119: FutureWarning: `tokenizer` is deprecated and will be removed in version 5.0.0 for `Trainer.__init__`. Use `processing_class` instead.
trainer = Trainer(model=model, tokenizer=tokenizer, args=training_args, **data_module)
WARNING:root:Dataset initialized with all QA data:
WARNING:root: - 100000 QA examples
WARNING:root: - 100 fact examples with upsampling factor 1000
WARNING:root: - Total examples: 200000
/root/yuwei/WikiDYKEvalV2/src/train.py:119: FutureWarning: `tokenizer` is deprecated and will be removed in version 5.0.0 for `Trainer.__init__`. Use `processing_class` instead.
trainer = Trainer(model=model, tokenizer=tokenizer, args=training_args, **data_module)
WARNING:root:Dataset initialized with all QA data:
WARNING:root: - 100000 QA examples
WARNING:root: - 100 fact examples with upsampling factor 1000
WARNING:root: - Total examples: 200000
/root/yuwei/WikiDYKEvalV2/src/train.py:119: FutureWarning: `tokenizer` is deprecated and will be removed in version 5.0.0 for `Trainer.__init__`. Use `processing_class` instead.
trainer = Trainer(model=model, tokenizer=tokenizer, args=training_args, **data_module)
WARNING:root:Dataset initialized with all QA data:
WARNING:root: - 100000 QA examples
WARNING:root: - 100 fact examples with upsampling factor 1000
WARNING:root: - Total examples: 200000
/root/yuwei/WikiDYKEvalV2/src/train.py:119: FutureWarning: `tokenizer` is deprecated and will be removed in version 5.0.0 for `Trainer.__init__`. Use `processing_class` instead.
trainer = Trainer(model=model, tokenizer=tokenizer, args=training_args, **data_module)
Checkpoint missing; starting training from scratch
Checkpoint missing; starting training from scratch
Checkpoint missing; starting training from scratch
Checkpoint missing; starting training from scratch
wandb: WARNING The `run_name` is currently set to the same value as `TrainingArguments.output_dir`. If this was not intended, please specify a different run name by setting the `TrainingArguments.run_name` parameter.
It is strongly recommended to train Gemma3 models with the `eager` attention implementation instead of `flash_attention_2`. Use `eager` with `AutoModelForCausalLM.from_pretrained('<path-to-checkpoint>', attn_implementation='eager')`.
It is strongly recommended to train Gemma3 models with the `eager` attention implementation instead of `flash_attention_2`. Use `eager` with `AutoModelForCausalLM.from_pretrained('<path-to-checkpoint>', attn_implementation='eager')`.
It is strongly recommended to train Gemma3 models with the `eager` attention implementation instead of `flash_attention_2`. Use `eager` with `AutoModelForCausalLM.from_pretrained('<path-to-checkpoint>', attn_implementation='eager')`.
wandb: Currently logged in as: yuweiz to https://api.wandb.ai. Use `wandb login --relogin` to force relogin
wandb: Tracking run with wandb version 0.19.11
wandb: Run data is saved locally in /root/yuwei/WikiDYKEvalV2/wandb/run-20250514_214351-thkr8ndb
wandb: Run `wandb offline` to turn off syncing.
wandb: Syncing run train_results_pred_mask/google_gemma-3-1b-pt_qa_ds100_upsample1000
wandb: βοΈ View project at https://wandb.ai/yuweiz/wikidyk-ar
wandb: π View run at https://wandb.ai/yuweiz/wikidyk-ar/runs/thkr8ndb
0%| | 0/1563 [00:00<?, ?it/s]It is strongly recommended to train Gemma3 models with the `eager` attention implementation instead of `flash_attention_2`. Use `eager` with `AutoModelForCausalLM.from_pretrained('<path-to-checkpoint>', attn_implementation='eager')`.
[rank2]:[W514 21:43:53.328500884 reducer.cpp:1400] Warning: find_unused_parameters=True was specified in DDP constructor, but did not find any unused parameters in the forward pass. This flag results in an extra traversal of the autograd graph every iteration, which can adversely affect performance. If your model indeed never has any unused parameters in the forward pass, consider turning this flag off. Note that this warning may be a false positive if your model has flow control causing later iterations to have unused parameters. (function operator())
[rank1]:[W514 21:43:53.333029675 reducer.cpp:1400] Warning: find_unused_parameters=True was specified in DDP constructor, but did not find any unused parameters in the forward pass. This flag results in an extra traversal of the autograd graph every iteration, which can adversely affect performance. If your model indeed never has any unused parameters in the forward pass, consider turning this flag off. Note that this warning may be a false positive if your model has flow control causing later iterations to have unused parameters. (function operator())
[rank3]:[W514 21:43:53.336456929 reducer.cpp:1400] Warning: find_unused_parameters=True was specified in DDP constructor, but did not find any unused parameters in the forward pass. This flag results in an extra traversal of the autograd graph every iteration, which can adversely affect performance. If your model indeed never has any unused parameters in the forward pass, consider turning this flag off. Note that this warning may be a false positive if your model has flow control causing later iterations to have unused parameters. (function operator())
[rank0]:[W514 21:43:53.339178719 reducer.cpp:1400] Warning: find_unused_parameters=True was specified in DDP constructor, but did not find any unused parameters in the forward pass. This flag results in an extra traversal of the autograd graph every iteration, which can adversely affect performance. If your model indeed never has any unused parameters in the forward pass, consider turning this flag off. Note that this warning may be a false positive if your model has flow control causing later iterations to have unused parameters. (function operator())
0%| | 1/1563 [00:02<58:53, 2.26s/it]
0%| | 2/1563 [00:02<31:34, 1.21s/it]
0%| | 3/1563 [00:03<25:28, 1.02it/s]
0%| | 4/1563 [00:04<29:51, 1.15s/it]
0%| | 5/1563 [00:05<23:51, 1.09it/s]
0%| | 6/1563 [00:06<23:17, 1.11it/s]
0%| | 7/1563 [00:06<21:42, 1.19it/s]
1%| | 8/1563 [00:07<22:32, 1.15it/s]
1%| | 9/1563 [00:08<21:40, 1.19it/s]
1%| | 10/1563 [00:09<18:39, 1.39it/s]
1%| | 11/1563 [00:09<16:43, 1.55it/s]
1%| | 12/1563 [00:10<18:18, 1.41it/s]
1%| | 13/1563 [00:10<17:12, 1.50it/s]
1%| | 14/1563 [00:11<17:28, 1.48it/s]
1%| | 15/1563 [00:12<17:57, 1.44it/s]
1%| | 16/1563 [00:13<17:49, 1.45it/s]
1%| | 17/1563 [00:13<17:00, 1.52it/s]
1%| | 18/1563 [00:14<15:29, 1.66it/s]
1%| | 19/1563 [00:14<17:11, 1.50it/s]
1%|β | 20/1563 [00:15<16:15, 1.58it/s]
1%|β | 21/1563 [00:16<16:34, 1.55it/s]
1%|β | 22/1563 [00:17<22:29, 1.14it/s]
1%|β | 23/1563 [00:18<20:43, 1.24it/s]
2%|β | 24/1563 [00:18<18:00, 1.42it/s]
2%|β | 25/1563 [00:19<17:10, 1.49it/s]
2%|β | 26/1563 [00:19<15:25, 1.66it/s]
2%|β | 27/1563 [00:20<17:09, 1.49it/s]
2%|β | 28/1563 [00:21<17:14, 1.48it/s]
2%|β | 29/1563 [00:21<16:29, 1.55it/s]
2%|β | 30/1563 [00:22<15:06, 1.69it/s]
2%|β | 31/1563 [00:22<14:10, 1.80it/s]
2%|β | 32/1563 [00:23<16:17, 1.57it/s]
2%|β | 33/1563 [00:24<20:09, 1.26it/s]
2%|β | 34/1563 [00:25<17:39, 1.44it/s]
2%|β | 35/1563 [00:25<15:57, 1.60it/s]
2%|β | 36/1563 [00:26<15:51, 1.60it/s]
2%|β | 37/1563 [00:26<15:29, 1.64it/s]
2%|β | 38/1563 [00:27<16:19, 1.56it/s]
2%|β | 39/1563 [00:28<16:26, 1.54it/s]
3%|β | 40/1563 [00:28<16:19, 1.56it/s]
3%|β | 41/1563 [00:29<15:50, 1.60it/s]
3%|β | 42/1563 [00:30<16:59, 1.49it/s]
3%|β | 43/1563 [00:30<17:21, 1.46it/s]
3%|β | 44/1563 [00:31<18:16, 1.38it/s]
3%|β | 45/1563 [00:32<17:32, 1.44it/s]
3%|β | 46/1563 [00:32<15:54, 1.59it/s]
3%|β | 47/1563 [00:33<15:21, 1.65it/s]
3%|β | 48/1563 [00:33<14:43, 1.71it/s]
3%|β | 49/1563 [00:34<15:37, 1.62it/s]
3%|β | 50/1563 [00:35<16:50, 1.50it/s]
{'loss': 3.5671, 'grad_norm': 996.0, 'learning_rate': 1.9373000639795267e-05, 'epoch': 0.03}
3%|β | 50/1563 [00:35<16:50, 1.50it/s]
3%|β | 51/1563 [00:36<17:52, 1.41it/s]
3%|β | 52/1563 [00:36<16:09, 1.56it/s]
3%|β | 53/1563 [00:37<17:24, 1.45it/s]
3%|β | 54/1563 [00:38<18:36, 1.35it/s]
4%|β | 55/1563 [00:39<19:03, 1.32it/s]
4%|β | 56/1563 [00:39<17:03, 1.47it/s]
4%|β | 57/1563 [00:41<21:41, 1.16it/s]
4%|β | 58/1563 [00:41<20:19, 1.23it/s]
4%|β | 59/1563 [00:42<17:54, 1.40it/s]
4%|β | 60/1563 [00:42<16:05, 1.56it/s]
4%|β | 61/1563 [00:43<17:34, 1.42it/s]
4%|β | 62/1563 [00:44<18:03, 1.38it/s]
4%|β | 63/1563 [00:44<17:18, 1.44it/s]
4%|β | 64/1563 [00:45<17:29, 1.43it/s]
4%|β | 65/1563 [00:46<16:14, 1.54it/s]
4%|β | 66/1563 [00:46<16:43, 1.49it/s]
4%|β | 67/1563 [00:47<15:01, 1.66it/s]
4%|β | 68/1563 [00:47<14:43, 1.69it/s]
4%|β | 69/1563 [00:48<13:42, 1.82it/s]
4%|β | 70/1563 [00:48<14:28, 1.72it/s]
5%|β | 71/1563 [00:49<16:06, 1.54it/s]
5%|β | 72/1563 [00:50<15:00, 1.66it/s]
5%|β | 73/1563 [00:50<15:14, 1.63it/s]
5%|β | 74/1563 [00:51<16:11, 1.53it/s]
5%|β | 75/1563 [00:52<16:13, 1.53it/s]
5%|β | 76/1563 [00:53<16:27, 1.51it/s]
5%|β | 77/1563 [00:53<14:50, 1.67it/s]
5%|β | 78/1563 [00:54<15:09, 1.63it/s]
5%|β | 79/1563 [00:54<13:55, 1.78it/s]
5%|β | 80/1563 [00:55<15:26, 1.60it/s]
5%|β | 81/1563 [00:56<15:48, 1.56it/s]
5%|β | 82/1563 [00:56<16:44, 1.47it/s]
5%|β | 83/1563 [00:57<15:32, 1.59it/s]
5%|β | 84/1563 [00:57<14:18, 1.72it/s]
5%|β | 85/1563 [00:58<16:08, 1.53it/s]
6%|β | 86/1563 [00:59<16:27, 1.50it/s]
6%|β | 87/1563 [00:59<14:37, 1.68it/s]
6%|β | 88/1563 [01:00<14:22, 1.71it/s]
6%|β | 89/1563 [01:01<16:11, 1.52it/s]
6%|β | 90/1563 [01:01<14:48, 1.66it/s]
6%|β | 91/1563 [01:02<16:03, 1.53it/s]
6%|β | 92/1563 [01:03<17:37, 1.39it/s]
6%|β | 93/1563 [01:03<16:19, 1.50it/s]
6%|β | 94/1563 [01:04<14:42, 1.66it/s]
6%|β | 95/1563 [01:05<17:03, 1.43it/s]
6%|β | 96/1563 [01:05<16:24, 1.49it/s]
6%|β | 97/1563 [01:06<16:37, 1.47it/s]
6%|β | 98/1563 [01:07<17:51, 1.37it/s]
6%|β | 99/1563 [01:07<16:22, 1.49it/s]
6%|β | 100/1563 [01:08<16:51, 1.45it/s]
{'loss': 0.7259, 'grad_norm': 15.6875, 'learning_rate': 1.8733205374280233e-05, 'epoch': 0.06}
6%|β | 100/1563 [01:08<16:51, 1.45it/s]
6%|β | 101/1563 [01:09<16:29, 1.48it/s]
7%|β | 102/1563 [01:09<16:37, 1.47it/s]
7%|β | 103/1563 [01:10<17:50, 1.36it/s]
7%|β | 104/1563 [01:11<16:26, 1.48it/s]
7%|β | 105/1563 [01:11<16:29, 1.47it/s]
7%|β | 106/1563 [01:12<14:43, 1.65it/s]
7%|β | 107/1563 [01:13<16:21, 1.48it/s]
7%|β | 108/1563 [01:13<15:55, 1.52it/s]
7%|β | 109/1563 [01:14<16:31, 1.47it/s]
7%|β | 110/1563 [01:15<16:16, 1.49it/s]
7%|β | 111/1563 [01:15<14:37, 1.65it/s]
7%|β | 112/1563 [01:16<15:08, 1.60it/s]
7%|β | 113/1563 [01:16<14:23, 1.68it/s]
7%|β | 114/1563 [01:17<14:49, 1.63it/s]
7%|β | 115/1563 [01:18<13:39, 1.77it/s]
7%|β | 116/1563 [01:18<15:36, 1.55it/s]
7%|β | 117/1563 [01:19<14:38, 1.65it/s]
8%|β | 118/1563 [01:19<14:31, 1.66it/s]
8%|β | 119/1563 [01:20<13:54, 1.73it/s]
8%|β | 120/1563 [01:20<12:50, 1.87it/s]
8%|β | 121/1563 [01:21<14:24, 1.67it/s]
8%|β | 122/1563 [01:22<13:48, 1.74it/s]
8%|β | 123/1563 [01:23<15:53, 1.51it/s]
8%|β | 124/1563 [01:23<14:52, 1.61it/s]
8%|β | 125/1563 [01:24<15:52, 1.51it/s]
8%|β | 126/1563 [01:25<17:03, 1.40it/s]
8%|β | 127/1563 [01:26<18:01, 1.33it/s]
8%|β | 128/1563 [01:26<18:37, 1.28it/s]
8%|β | 129/1563 [01:27<16:26, 1.45it/s]
8%|β | 130/1563 [01:28<16:22, 1.46it/s]
8%|β | 131/1563 [01:28<17:15, 1.38it/s]
8%|β | 132/1563 [01:29<15:33, 1.53it/s]
9%|β | 133/1563 [01:29<14:53, 1.60it/s]
9%|β | 134/1563 [01:30<16:31, 1.44it/s]
9%|β | 135/1563 [01:31<16:27, 1.45it/s]
9%|β | 136/1563 [01:32<16:54, 1.41it/s]
9%|β | 137/1563 [01:32<15:07, 1.57it/s]
9%|β | 138/1563 [01:33<16:33, 1.43it/s]
9%|β | 139/1563 [01:33<14:51, 1.60it/s]
9%|β | 140/1563 [01:34<15:38, 1.52it/s]
9%|β | 141/1563 [01:35<15:49, 1.50it/s]
9%|β | 142/1563 [01:35<14:10, 1.67it/s]
9%|β | 143/1563 [01:36<14:24, 1.64it/s]
9%|β | 144/1563 [01:37<14:55, 1.58it/s]
9%|β | 145/1563 [01:37<13:48, 1.71it/s]
9%|β | 146/1563 [01:38<14:15, 1.66it/s]
9%|β | 147/1563 [01:39<15:53, 1.48it/s]
9%|β | 148/1563 [01:39<16:32, 1.43it/s]
10%|β | 149/1563 [01:40<16:38, 1.42it/s]
10%|β | 150/1563 [01:41<17:39, 1.33it/s]
{'loss': 0.2407, 'grad_norm': 35.75, 'learning_rate': 1.8093410108765196e-05, 'epoch': 0.1}
10%|β | 150/1563 [01:41<17:39, 1.33it/s]
10%|β | 151/1563 [01:41<15:54, 1.48it/s]
10%|β | 152/1563 [01:42<14:24, 1.63it/s]
10%|β | 153/1563 [01:43<16:06, 1.46it/s]
10%|β | 154/1563 [01:44<17:15, 1.36it/s]
10%|β | 155/1563 [01:44<18:07, 1.29it/s]
10%|β | 156/1563 [01:45<17:36, 1.33it/s]
10%|β | 157/1563 [01:46<15:41, 1.49it/s]
10%|β | 158/1563 [01:46<14:06, 1.66it/s]
10%|β | 159/1563 [01:47<14:46, 1.58it/s]
10%|β | 160/1563 [01:47<14:27, 1.62it/s]
10%|β | 161/1563 [01:48<15:17, 1.53it/s]
10%|β | 162/1563 [01:49<16:40, 1.40it/s]
10%|β | 163/1563 [01:49<14:47, 1.58it/s]
10%|β | 164/1563 [01:50<16:13, 1.44it/s]
11%|β | 165/1563 [01:51<15:58, 1.46it/s]
11%|β | 166/1563 [01:51<14:34, 1.60it/s]
11%|β | 167/1563 [01:52<14:04, 1.65it/s]
11%|β | 168/1563 [01:53<15:45, 1.47it/s]
11%|β | 169/1563 [01:53<15:24, 1.51it/s]
11%|β | 170/1563 [01:54<15:21, 1.51it/s]
11%|β | 171/1563 [01:55<15:33, 1.49it/s]
11%|β | 172/1563 [01:55<14:48, 1.57it/s]
11%|β | 173/1563 [01:56<15:25, 1.50it/s]
11%|β | 174/1563 [01:57<16:04, 1.44it/s]
11%|β | 175/1563 [01:58<16:10, 1.43it/s]
11%|ββ | 176/1563 [01:58<16:49, 1.37it/s]
11%|ββ | 177/1563 [01:59<15:18, 1.51it/s]
11%|ββ | 178/1563 [01:59<15:08, 1.52it/s]
11%|ββ | 179/1563 [02:00<14:00, 1.65it/s]
12%|ββ | 180/1563 [02:01<15:37, 1.48it/s]
12%|ββ | 181/1563 [02:02<16:46, 1.37it/s]
12%|ββ | 182/1563 [02:02<15:15, 1.51it/s]
12%|ββ | 183/1563 [02:03<14:02, 1.64it/s]
12%|ββ | 184/1563 [02:03<14:31, 1.58it/s]
12%|ββ | 185/1563 [02:04<15:28, 1.48it/s]
12%|ββ | 186/1563 [02:05<14:19, 1.60it/s]
12%|ββ | 187/1563 [02:05<15:10, 1.51it/s]
12%|ββ | 188/1563 [02:06<16:38, 1.38it/s]
12%|ββ | 189/1563 [02:07<17:33, 1.30it/s]
12%|ββ | 190/1563 [02:08<17:05, 1.34it/s]
12%|ββ | 191/1563 [02:08<16:31, 1.38it/s]
12%|ββ | 192/1563 [02:09<15:33, 1.47it/s]
12%|ββ | 193/1563 [02:10<16:42, 1.37it/s]
12%|ββ | 194/1563 [02:10<14:28, 1.58it/s]
12%|ββ | 195/1563 [02:11<14:44, 1.55it/s]
13%|ββ | 196/1563 [02:12<15:35, 1.46it/s]
13%|ββ | 197/1563 [02:13<16:33, 1.38it/s]
13%|ββ | 198/1563 [02:13<16:19, 1.39it/s]
13%|ββ | 199/1563 [02:14<15:22, 1.48it/s]
13%|ββ | 200/1563 [02:15<15:31, 1.46it/s]
{'loss': 0.2004, 'grad_norm': 15.75, 'learning_rate': 1.7453614843250163e-05, 'epoch': 0.13}
13%|ββ | 200/1563 [02:15<15:31, 1.46it/s]
13%|ββ | 201/1563 [02:15<15:57, 1.42it/s]
13%|ββ | 202/1563 [02:16<13:55, 1.63it/s]
13%|ββ | 203/1563 [02:17<15:24, 1.47it/s]
13%|ββ | 204/1563 [02:17<15:31, 1.46it/s]
13%|ββ | 205/1563 [02:18<14:52, 1.52it/s]
13%|ββ | 206/1563 [02:19<16:07, 1.40it/s]
13%|ββ | 207/1563 [02:19<15:53, 1.42it/s]
13%|ββ | 208/1563 [02:20<14:59, 1.51it/s]
13%|ββ | 209/1563 [02:21<16:04, 1.40it/s]
13%|ββ | 210/1563 [02:22<16:47, 1.34it/s]
13%|ββ | 211/1563 [02:22<15:35, 1.44it/s]
14%|ββ | 212/1563 [02:23<14:01, 1.60it/s]
14%|ββ | 213/1563 [02:23<12:59, 1.73it/s]
14%|ββ | 214/1563 [02:24<14:01, 1.60it/s]
14%|ββ | 215/1563 [02:25<19:18, 1.16it/s]
14%|ββ | 216/1563 [02:26<16:14, 1.38it/s]
14%|ββ | 217/1563 [02:26<14:18, 1.57it/s]
14%|ββ | 218/1563 [02:26<12:54, 1.74it/s]
14%|ββ | 219/1563 [02:27<13:47, 1.62it/s]
14%|ββ | 220/1563 [02:28<16:03, 1.39it/s]
14%|ββ | 221/1563 [02:29<14:43, 1.52it/s]
14%|ββ | 222/1563 [02:30<16:01, 1.39it/s]
14%|ββ | 223/1563 [02:30<14:35, 1.53it/s]
14%|ββ | 224/1563 [02:31<15:52, 1.41it/s]
14%|ββ | 225/1563 [02:32<15:35, 1.43it/s]
14%|ββ | 226/1563 [02:32<14:46, 1.51it/s]
15%|ββ | 227/1563 [02:33<14:25, 1.54it/s]
15%|ββ | 228/1563 [02:33<14:36, 1.52it/s]
15%|ββ | 229/1563 [02:34<15:21, 1.45it/s]
15%|ββ | 230/1563 [02:35<15:52, 1.40it/s]
15%|ββ | 231/1563 [02:36<16:19, 1.36it/s]
15%|ββ | 232/1563 [02:36<15:32, 1.43it/s]
15%|ββ | 233/1563 [02:37<16:35, 1.34it/s]
15%|ββ | 234/1563 [02:38<16:09, 1.37it/s]
15%|ββ | 235/1563 [02:39<16:59, 1.30it/s]
15%|ββ | 236/1563 [02:39<16:18, 1.36it/s]
15%|ββ | 237/1563 [02:40<16:31, 1.34it/s]
15%|ββ | 238/1563 [02:41<14:31, 1.52it/s]
15%|ββ | 239/1563 [02:41<15:41, 1.41it/s]
15%|ββ | 240/1563 [02:42<15:36, 1.41it/s]
15%|ββ | 241/1563 [02:43<14:45, 1.49it/s]
15%|ββ | 242/1563 [02:43<13:03, 1.69it/s]
16%|ββ | 243/1563 [02:44<14:04, 1.56it/s]
16%|ββ | 244/1563 [02:45<14:25, 1.52it/s]
16%|ββ | 245/1563 [02:45<13:26, 1.63it/s]
16%|ββ | 246/1563 [02:46<15:05, 1.45it/s]
16%|ββ | 247/1563 [02:47<15:54, 1.38it/s]
16%|ββ | 248/1563 [02:48<15:49, 1.38it/s]
16%|ββ | 249/1563 [02:48<13:53, 1.58it/s]
16%|ββ | 250/1563 [02:49<13:57, 1.57it/s]
{'loss': 0.1715, 'grad_norm': 46.75, 'learning_rate': 1.6813819577735126e-05, 'epoch': 0.16}
16%|ββ | 250/1563 [02:49<13:57, 1.57it/s]
16%|ββ | 251/1563 [02:49<12:42, 1.72it/s]
16%|ββ | 252/1563 [02:50<12:43, 1.72it/s]
16%|ββ | 253/1563 [02:50<14:31, 1.50it/s]
16%|ββ | 254/1563 [02:51<13:25, 1.63it/s]
16%|ββ | 255/1563 [02:52<13:25, 1.62it/s]
16%|ββ | 256/1563 [02:52<13:46, 1.58it/s]
16%|ββ | 257/1563 [02:53<15:01, 1.45it/s]
17%|ββ | 258/1563 [02:54<13:35, 1.60it/s]
17%|ββ | 259/1563 [02:54<12:30, 1.74it/s]
17%|ββ | 260/1563 [02:55<12:13, 1.78it/s]
17%|ββ | 261/1563 [02:55<13:58, 1.55it/s]
17%|ββ | 262/1563 [02:56<12:48, 1.69it/s]
17%|ββ | 263/1563 [02:56<11:57, 1.81it/s]
17%|ββ | 264/1563 [02:57<11:28, 1.89it/s]
17%|ββ | 265/1563 [02:57<11:35, 1.87it/s]
17%|ββ | 266/1563 [02:58<12:33, 1.72it/s]
17%|ββ | 267/1563 [02:59<13:18, 1.62it/s]
17%|ββ | 268/1563 [02:59<13:27, 1.60it/s]
17%|ββ | 269/1563 [03:00<14:45, 1.46it/s]
17%|ββ | 270/1563 [03:01<13:20, 1.62it/s]
17%|ββ | 271/1563 [03:01<14:19, 1.50it/s]
17%|ββ | 272/1563 [03:02<14:26, 1.49it/s]
17%|ββ | 273/1563 [03:03<14:37, 1.47it/s]
18%|ββ | 274/1563 [03:03<13:12, 1.63it/s]
18%|ββ | 275/1563 [03:04<13:28, 1.59it/s]
18%|ββ | 276/1563 [03:04<12:40, 1.69it/s]
18%|ββ | 277/1563 [03:05<13:59, 1.53it/s]
18%|ββ | 278/1563 [03:06<15:20, 1.40it/s]
18%|ββ | 279/1563 [03:07<15:50, 1.35it/s]
18%|ββ | 280/1563 [03:08<14:48, 1.44it/s]
18%|ββ | 281/1563 [03:08<15:05, 1.42it/s]
18%|ββ | 282/1563 [03:09<14:52, 1.44it/s]
18%|ββ | 283/1563 [03:10<14:43, 1.45it/s]
18%|ββ | 284/1563 [03:10<14:45, 1.44it/s]
18%|ββ | 285/1563 [03:11<15:20, 1.39it/s]
18%|ββ | 286/1563 [03:12<13:54, 1.53it/s]
18%|ββ | 287/1563 [03:12<15:09, 1.40it/s]
18%|ββ | 288/1563 [03:13<15:47, 1.35it/s]
18%|ββ | 289/1563 [03:14<16:30, 1.29it/s]
19%|ββ | 290/1563 [03:15<14:38, 1.45it/s]
19%|ββ | 291/1563 [03:15<14:59, 1.41it/s]
19%|ββ | 292/1563 [03:16<15:28, 1.37it/s]
19%|ββ | 293/1563 [03:17<16:18, 1.30it/s]
19%|ββ | 294/1563 [03:18<16:46, 1.26it/s]
19%|ββ | 295/1563 [03:19<17:08, 1.23it/s]
19%|ββ | 296/1563 [03:19<16:56, 1.25it/s]
19%|ββ | 297/1563 [03:20<17:15, 1.22it/s]
19%|ββ | 298/1563 [03:21<16:55, 1.25it/s]
19%|ββ | 299/1563 [03:22<16:52, 1.25it/s]
19%|ββ | 300/1563 [03:23<16:56, 1.24it/s]
{'loss': 0.1797, 'grad_norm': 426.0, 'learning_rate': 1.6174024312220092e-05, 'epoch': 0.19}
19%|ββ | 300/1563 [03:23<16:56, 1.24it/s]
19%|ββ | 301/1563 [03:23<16:20, 1.29it/s]
19%|ββ | 302/1563 [03:24<14:23, 1.46it/s]
19%|ββ | 303/1563 [03:24<13:27, 1.56it/s]
19%|ββ | 304/1563 [03:25<14:08, 1.48it/s]
20%|ββ | 305/1563 [03:26<14:31, 1.44it/s]
20%|ββ | 306/1563 [03:26<13:54, 1.51it/s]
20%|ββ | 307/1563 [03:27<14:57, 1.40it/s]
20%|ββ | 308/1563 [03:28<13:29, 1.55it/s]
20%|ββ | 309/1563 [03:29<14:32, 1.44it/s]
20%|ββ | 310/1563 [03:29<13:09, 1.59it/s]
20%|ββ | 311/1563 [03:30<12:50, 1.63it/s]
20%|ββ | 312/1563 [03:31<14:15, 1.46it/s]
20%|ββ | 313/1563 [03:31<13:35, 1.53it/s]
20%|ββ | 314/1563 [03:32<14:42, 1.42it/s]
20%|ββ | 315/1563 [03:33<13:56, 1.49it/s]
20%|ββ | 316/1563 [03:33<12:51, 1.62it/s]
20%|ββ | 317/1563 [03:34<13:17, 1.56it/s]
20%|ββ | 318/1563 [03:34<12:32, 1.65it/s]
20%|ββ | 319/1563 [03:35<12:54, 1.61it/s]
20%|ββ | 320/1563 [03:36<14:20, 1.45it/s]
21%|ββ | 321/1563 [03:36<14:23, 1.44it/s]
21%|ββ | 322/1563 [03:37<15:01, 1.38it/s]
21%|ββ | 323/1563 [03:38<14:33, 1.42it/s]
21%|ββ | 324/1563 [03:39<14:38, 1.41it/s]
21%|ββ | 325/1563 [03:39<14:12, 1.45it/s]
21%|ββ | 326/1563 [03:40<14:42, 1.40it/s]
21%|ββ | 327/1563 [03:41<15:25, 1.34it/s]
21%|ββ | 328/1563 [03:41<13:30, 1.52it/s]
21%|ββ | 329/1563 [03:42<14:33, 1.41it/s]
21%|ββ | 330/1563 [03:43<13:58, 1.47it/s]
21%|ββ | 331/1563 [03:43<12:31, 1.64it/s]
21%|ββ | 332/1563 [03:44<11:24, 1.80it/s]
21%|βββ | 333/1563 [03:44<12:48, 1.60it/s]
21%|βββ | 334/1563 [03:45<14:10, 1.44it/s]
21%|βββ | 335/1563 [03:46<12:57, 1.58it/s]
21%|βββ | 336/1563 [03:47<14:23, 1.42it/s]
22%|βββ | 337/1563 [03:47<13:23, 1.53it/s]
22%|βββ | 338/1563 [03:48<14:32, 1.40it/s]
22%|βββ | 339/1563 [03:49<14:30, 1.41it/s]
22%|βββ | 340/1563 [03:49<13:30, 1.51it/s]
22%|βββ | 341/1563 [03:50<14:45, 1.38it/s]
22%|βββ | 342/1563 [03:51<15:33, 1.31it/s]
22%|βββ | 343/1563 [03:52<15:00, 1.35it/s]
22%|βββ | 344/1563 [03:52<13:22, 1.52it/s]
22%|βββ | 345/1563 [03:53<14:24, 1.41it/s]
22%|βββ | 346/1563 [03:54<14:29, 1.40it/s]
22%|βββ | 347/1563 [03:55<15:21, 1.32it/s]
22%|βββ | 348/1563 [03:55<15:05, 1.34it/s]
22%|βββ | 349/1563 [03:56<15:40, 1.29it/s]
22%|βββ | 350/1563 [03:57<15:11, 1.33it/s]
{'loss': 0.1763, 'grad_norm': 15.5, 'learning_rate': 1.5534229046705055e-05, 'epoch': 0.22}
22%|βββ | 350/1563 [03:57<15:11, 1.33it/s]
22%|βββ | 351/1563 [03:58<15:22, 1.31it/s]
23%|βββ | 352/1563 [03:58<15:50, 1.27it/s]
23%|βββ | 353/1563 [03:59<16:13, 1.24it/s]
23%|βββ | 354/1563 [04:00<16:26, 1.23it/s]
23%|βββ | 355/1563 [04:01<16:10, 1.24it/s]
23%|βββ | 356/1563 [04:02<15:51, 1.27it/s]
23%|βββ | 357/1563 [04:02<14:15, 1.41it/s]
23%|βββ | 358/1563 [04:03<15:13, 1.32it/s]
23%|βββ | 359/1563 [04:04<14:31, 1.38it/s]
23%|βββ | 360/1563 [04:05<15:00, 1.34it/s]
23%|βββ | 361/1563 [04:05<15:14, 1.31it/s]
23%|βββ | 362/1563 [04:06<15:42, 1.27it/s]
23%|βββ | 363/1563 [04:07<14:53, 1.34it/s]
23%|βββ | 364/1563 [04:07<13:19, 1.50it/s]
23%|βββ | 365/1563 [04:08<12:23, 1.61it/s]
23%|βββ | 366/1563 [04:09<13:18, 1.50it/s]
23%|βββ | 367/1563 [04:09<13:40, 1.46it/s]
24%|βββ | 368/1563 [04:10<14:45, 1.35it/s]
24%|βββ | 369/1563 [04:11<13:10, 1.51it/s]
24%|βββ | 370/1563 [04:11<11:57, 1.66it/s]
24%|βββ | 371/1563 [04:12<10:57, 1.81it/s]
24%|βββ | 372/1563 [04:12<12:48, 1.55it/s]
24%|βββ | 373/1563 [04:13<11:45, 1.69it/s]
24%|βββ | 374/1563 [04:14<13:16, 1.49it/s]
24%|βββ | 375/1563 [04:15<14:10, 1.40it/s]
24%|βββ | 376/1563 [04:15<12:36, 1.57it/s]
24%|βββ | 377/1563 [04:15<11:27, 1.73it/s]
24%|βββ | 378/1563 [04:16<10:35, 1.87it/s]
24%|βββ | 379/1563 [04:17<11:48, 1.67it/s]
24%|βββ | 380/1563 [04:17<12:16, 1.61it/s]
24%|βββ | 381/1563 [04:18<13:34, 1.45it/s]
24%|βββ | 382/1563 [04:19<14:14, 1.38it/s]
25%|βββ | 383/1563 [04:20<14:51, 1.32it/s]
25%|βββ | 384/1563 [04:20<13:13, 1.49it/s]
25%|βββ | 385/1563 [04:21<13:22, 1.47it/s]
25%|βββ | 386/1563 [04:22<12:39, 1.55it/s]
25%|βββ | 387/1563 [04:22<12:14, 1.60it/s]
25%|βββ | 388/1563 [04:23<13:11, 1.48it/s]
25%|βββ | 389/1563 [04:23<12:14, 1.60it/s]
25%|βββ | 390/1563 [04:24<12:42, 1.54it/s]
25%|βββ | 391/1563 [04:25<11:47, 1.66it/s]
25%|βββ | 392/1563 [04:25<13:14, 1.47it/s]
25%|βββ | 393/1563 [04:26<14:07, 1.38it/s]
25%|βββ | 394/1563 [04:27<14:41, 1.33it/s]
25%|βββ | 395/1563 [04:28<13:05, 1.49it/s]
25%|βββ | 396/1563 [04:28<13:10, 1.48it/s]
25%|βββ | 397/1563 [04:29<12:32, 1.55it/s]
25%|βββ | 398/1563 [04:29<11:35, 1.67it/s]
26%|βββ | 399/1563 [04:30<13:00, 1.49it/s]
26%|βββ | 400/1563 [04:31<12:21, 1.57it/s]
{'loss': 0.161, 'grad_norm': 11.1875, 'learning_rate': 1.4894433781190021e-05, 'epoch': 0.26}
26%|βββ | 400/1563 [04:31<12:21, 1.57it/s]
26%|βββ | 401/1563 [04:32<13:32, 1.43it/s]
26%|βββ | 402/1563 [04:32<12:17, 1.57it/s]
26%|βββ | 403/1563 [04:33<12:32, 1.54it/s]
26%|βββ | 404/1563 [04:33<11:44, 1.65it/s]
26%|βββ | 405/1563 [04:34<12:05, 1.60it/s]
26%|βββ | 406/1563 [04:35<13:24, 1.44it/s]
26%|βββ | 407/1563 [04:36<14:06, 1.37it/s]
26%|βββ | 408/1563 [04:36<14:51, 1.30it/s]
26%|βββ | 409/1563 [04:37<13:27, 1.43it/s]
26%|βββ | 410/1563 [04:38<14:16, 1.35it/s]
26%|βββ | 411/1563 [04:38<12:56, 1.48it/s]
26%|βββ | 412/1563 [04:39<11:49, 1.62it/s]
26%|βββ | 413/1563 [04:40<12:49, 1.50it/s]
26%|βββ | 414/1563 [04:40<11:28, 1.67it/s]
27%|βββ | 415/1563 [04:41<11:20, 1.69it/s]
27%|βββ | 416/1563 [04:41<10:38, 1.80it/s]
27%|βββ | 417/1563 [04:42<12:18, 1.55it/s]
27%|βββ | 418/1563 [04:43<13:24, 1.42it/s]
27%|βββ | 419/1563 [04:44<14:15, 1.34it/s]
27%|βββ | 420/1563 [04:44<14:23, 1.32it/s]
27%|βββ | 421/1563 [04:45<13:54, 1.37it/s]
27%|βββ | 422/1563 [04:46<14:35, 1.30it/s]
27%|βββ | 423/1563 [04:46<12:49, 1.48it/s]
27%|βββ | 424/1563 [04:47<11:53, 1.60it/s]
27%|βββ | 425/1563 [04:48<13:07, 1.44it/s]
27%|βββ | 426/1563 [04:49<14:02, 1.35it/s]
27%|βββ | 427/1563 [04:49<12:18, 1.54it/s]
27%|βββ | 428/1563 [04:50<13:15, 1.43it/s]
27%|βββ | 429/1563 [04:51<13:03, 1.45it/s]
28%|βββ | 430/1563 [04:51<13:23, 1.41it/s]
28%|βββ | 431/1563 [04:52<12:00, 1.57it/s]
28%|βββ | 432/1563 [04:52<10:46, 1.75it/s]
28%|βββ | 433/1563 [04:53<10:25, 1.81it/s]
28%|βββ | 434/1563 [04:53<10:12, 1.84it/s]
28%|βββ | 435/1563 [04:54<10:20, 1.82it/s]
28%|βββ | 436/1563 [04:55<11:52, 1.58it/s]
28%|βββ | 437/1563 [04:55<11:02, 1.70it/s]
28%|βββ | 438/1563 [04:56<11:45, 1.60it/s]
28%|βββ | 439/1563 [04:56<11:15, 1.66it/s]
28%|βββ | 440/1563 [04:57<10:50, 1.73it/s]
28%|βββ | 441/1563 [04:57<10:17, 1.82it/s]
28%|βββ | 442/1563 [04:58<09:41, 1.93it/s]
28%|βββ | 443/1563 [04:58<09:28, 1.97it/s]
28%|βββ | 444/1563 [04:59<09:49, 1.90it/s]
28%|βββ | 445/1563 [05:00<11:30, 1.62it/s]
29%|βββ | 446/1563 [05:00<12:09, 1.53it/s]
29%|βββ | 447/1563 [05:01<12:46, 1.46it/s]
29%|βββ | 448/1563 [05:02<12:56, 1.44it/s]
29%|βββ | 449/1563 [05:02<12:07, 1.53it/s]
29%|βββ | 450/1563 [05:03<12:29, 1.48it/s]
{'loss': 0.1574, 'grad_norm': 10.0, 'learning_rate': 1.4254638515674986e-05, 'epoch': 0.29}
29%|βββ | 450/1563 [05:03<12:29, 1.48it/s]
29%|βββ | 451/1563 [05:04<11:21, 1.63it/s]
29%|βββ | 452/1563 [05:04<11:50, 1.56it/s]
29%|βββ | 453/1563 [05:05<10:48, 1.71it/s]
29%|βββ | 454/1563 [05:05<11:01, 1.68it/s]
29%|βββ | 455/1563 [05:06<10:35, 1.74it/s]
29%|βββ | 456/1563 [05:06<10:06, 1.83it/s]
29%|βββ | 457/1563 [05:07<09:33, 1.93it/s]
29%|βββ | 458/1563 [05:07<09:13, 2.00it/s]
29%|βββ | 459/1563 [05:08<11:04, 1.66it/s]
29%|βββ | 460/1563 [05:09<12:27, 1.48it/s]
29%|βββ | 461/1563 [05:09<11:10, 1.64it/s]
30%|βββ | 462/1563 [05:10<12:31, 1.46it/s]
30%|βββ | 463/1563 [05:11<11:06, 1.65it/s]
30%|βββ | 464/1563 [05:11<10:13, 1.79it/s]
30%|βββ | 465/1563 [05:12<11:23, 1.61it/s]
30%|βββ | 466/1563 [05:13<12:42, 1.44it/s]
30%|βββ | 467/1563 [05:13<11:45, 1.55it/s]
30%|βββ | 468/1563 [05:14<10:42, 1.70it/s]
30%|βββ | 469/1563 [05:15<12:05, 1.51it/s]
30%|βββ | 470/1563 [05:15<11:09, 1.63it/s]
30%|βββ | 471/1563 [05:16<10:37, 1.71it/s]
30%|βββ | 472/1563 [05:16<09:57, 1.83it/s]
30%|βββ | 473/1563 [05:17<11:34, 1.57it/s]
30%|βββ | 474/1563 [05:17<10:22, 1.75it/s]
30%|βββ | 475/1563 [05:18<10:05, 1.80it/s]
30%|βββ | 476/1563 [05:18<09:17, 1.95it/s]
31%|βββ | 477/1563 [05:19<09:16, 1.95it/s]
31%|βββ | 478/1563 [05:19<08:50, 2.05it/s]
31%|βββ | 479/1563 [05:20<09:21, 1.93it/s]
31%|βββ | 480/1563 [05:21<11:09, 1.62it/s]
31%|βββ | 481/1563 [05:21<11:53, 1.52it/s]
31%|βββ | 482/1563 [05:22<10:46, 1.67it/s]
31%|βββ | 483/1563 [05:23<11:17, 1.59it/s]
31%|βββ | 484/1563 [05:23<10:59, 1.64it/s]
31%|βββ | 485/1563 [05:24<10:42, 1.68it/s]
31%|βββ | 486/1563 [05:24<09:43, 1.84it/s]
31%|βββ | 487/1563 [05:25<11:05, 1.62it/s]
31%|βββ | 488/1563 [05:26<11:55, 1.50it/s]
31%|ββββ | 489/1563 [05:27<12:38, 1.42it/s]
31%|ββββ | 490/1563 [05:27<13:14, 1.35it/s]
31%|ββββ | 491/1563 [05:28<11:58, 1.49it/s]
31%|ββββ | 492/1563 [05:28<11:14, 1.59it/s]
32%|ββββ | 493/1563 [05:29<10:24, 1.71it/s]
32%|ββββ | 494/1563 [05:30<11:29, 1.55it/s]
32%|ββββ | 495/1563 [05:30<10:40, 1.67it/s]
32%|ββββ | 496/1563 [05:31<11:57, 1.49it/s]
32%|ββββ | 497/1563 [05:31<10:45, 1.65it/s]
32%|ββββ | 498/1563 [05:32<10:50, 1.64it/s]
32%|ββββ | 499/1563 [05:33<11:16, 1.57it/s]
32%|ββββ | 500/1563 [05:34<12:29, 1.42it/s]
{'loss': 0.15, 'grad_norm': 14.0625, 'learning_rate': 1.361484325015995e-05, 'epoch': 0.32}
32%|ββββ | 500/1563 [05:34<12:29, 1.42it/s]
32%|ββββ | 501/1563 [05:34<12:59, 1.36it/s]
32%|ββββ | 502/1563 [05:35<11:37, 1.52it/s]
32%|ββββ | 503/1563 [05:35<10:59, 1.61it/s]
32%|ββββ | 504/1563 [05:36<11:44, 1.50it/s]
32%|ββββ | 505/1563 [05:37<12:49, 1.38it/s]
32%|ββββ | 506/1563 [05:38<13:18, 1.32it/s]
32%|ββββ | 507/1563 [05:39<13:42, 1.28it/s]
33%|ββββ | 508/1563 [05:40<13:45, 1.28it/s]
33%|ββββ | 509/1563 [05:40<14:00, 1.25it/s]
33%|ββββ | 510/1563 [05:41<12:55, 1.36it/s]
33%|ββββ | 511/1563 [05:42<11:46, 1.49it/s]
33%|ββββ | 512/1563 [05:42<10:55, 1.60it/s]
33%|ββββ | 513/1563 [05:43<10:57, 1.60it/s]
33%|ββββ | 514/1563 [05:43<10:52, 1.61it/s]
33%|ββββ | 515/1563 [05:44<10:08, 1.72it/s]
33%|ββββ | 516/1563 [05:44<09:46, 1.79it/s]
33%|ββββ | 517/1563 [05:45<10:17, 1.70it/s]
33%|ββββ | 518/1563 [05:45<10:06, 1.72it/s]
33%|ββββ | 519/1563 [05:46<10:19, 1.69it/s]
33%|ββββ | 520/1563 [05:47<10:16, 1.69it/s]
33%|ββββ | 521/1563 [05:47<11:05, 1.56it/s]
33%|ββββ | 522/1563 [05:48<11:32, 1.50it/s]
33%|ββββ | 523/1563 [05:49<12:23, 1.40it/s]
34%|ββββ | 524/1563 [05:50<12:38, 1.37it/s]
34%|ββββ | 525/1563 [05:50<11:23, 1.52it/s]
34%|ββββ | 526/1563 [05:51<10:31, 1.64it/s]
34%|ββββ | 527/1563 [05:51<09:47, 1.76it/s]
34%|ββββ | 528/1563 [05:52<09:32, 1.81it/s]
34%|ββββ | 529/1563 [05:53<10:46, 1.60it/s]
34%|ββββ | 530/1563 [05:53<11:03, 1.56it/s]
34%|ββββ | 531/1563 [05:54<11:41, 1.47it/s]
34%|ββββ | 532/1563 [05:55<12:23, 1.39it/s]
34%|ββββ | 533/1563 [05:55<11:39, 1.47it/s]
34%|ββββ | 534/1563 [05:56<10:35, 1.62it/s]
34%|ββββ | 535/1563 [05:57<11:36, 1.48it/s]
34%|ββββ | 536/1563 [05:57<10:32, 1.62it/s]
34%|ββββ | 537/1563 [05:58<11:41, 1.46it/s]
34%|ββββ | 538/1563 [05:58<10:38, 1.61it/s]
34%|ββββ | 539/1563 [05:59<10:42, 1.59it/s]
35%|ββββ | 540/1563 [06:00<11:06, 1.54it/s]
35%|ββββ | 541/1563 [06:00<10:50, 1.57it/s]
35%|ββββ | 542/1563 [06:01<10:15, 1.66it/s]
35%|ββββ | 543/1563 [06:02<11:19, 1.50it/s]
35%|ββββ | 544/1563 [06:03<11:52, 1.43it/s]
35%|ββββ | 545/1563 [06:03<12:37, 1.34it/s]
35%|ββββ | 546/1563 [06:04<13:11, 1.28it/s]
35%|ββββ | 547/1563 [06:05<12:41, 1.33it/s]
35%|ββββ | 548/1563 [06:05<11:05, 1.52it/s]
35%|ββββ | 549/1563 [06:06<11:03, 1.53it/s]
35%|ββββ | 550/1563 [06:06<09:48, 1.72it/s]
{'loss': 0.1867, 'grad_norm': 13.75, 'learning_rate': 1.2975047984644915e-05, 'epoch': 0.35}
35%|ββββ | 550/1563 [06:06<09:48, 1.72it/s]
35%|ββββ | 551/1563 [06:07<11:12, 1.50it/s]
35%|ββββ | 552/1563 [06:08<10:50, 1.55it/s]
35%|ββββ | 553/1563 [06:09<10:59, 1.53it/s]
35%|ββββ | 554/1563 [06:09<11:31, 1.46it/s]
36%|ββββ | 555/1563 [06:10<11:40, 1.44it/s]
36%|ββββ | 556/1563 [06:10<10:33, 1.59it/s]
36%|ββββ | 557/1563 [06:11<10:28, 1.60it/s]
36%|ββββ | 558/1563 [06:12<11:32, 1.45it/s]
36%|ββββ | 559/1563 [06:13<11:37, 1.44it/s]
36%|ββββ | 560/1563 [06:14<12:20, 1.35it/s]
36%|ββββ | 561/1563 [06:14<12:42, 1.31it/s]
36%|ββββ | 562/1563 [06:15<11:38, 1.43it/s]
36%|ββββ | 563/1563 [06:16<11:40, 1.43it/s]
36%|ββββ | 564/1563 [06:16<12:18, 1.35it/s]
36%|ββββ | 565/1563 [06:17<11:10, 1.49it/s]
36%|ββββ | 566/1563 [06:18<12:09, 1.37it/s]
36%|ββββ | 567/1563 [06:19<12:52, 1.29it/s]
36%|ββββ | 568/1563 [06:19<11:18, 1.47it/s]
36%|ββββ | 569/1563 [06:20<11:38, 1.42it/s]
36%|ββββ | 570/1563 [06:20<10:33, 1.57it/s]
37%|ββββ | 571/1563 [06:21<11:05, 1.49it/s]
37%|ββββ | 572/1563 [06:22<11:43, 1.41it/s]
37%|ββββ | 573/1563 [06:22<10:48, 1.53it/s]
37%|ββββ | 574/1563 [06:23<11:46, 1.40it/s]
37%|ββββ | 575/1563 [06:24<11:43, 1.40it/s]
37%|ββββ | 576/1563 [06:25<11:26, 1.44it/s]
37%|ββββ | 577/1563 [06:25<10:45, 1.53it/s]
37%|ββββ | 578/1563 [06:26<10:57, 1.50it/s]
37%|ββββ | 579/1563 [06:27<11:52, 1.38it/s]
37%|ββββ | 580/1563 [06:28<12:27, 1.31it/s]
37%|ββββ | 581/1563 [06:28<11:47, 1.39it/s]
37%|ββββ | 582/1563 [06:29<12:19, 1.33it/s]
37%|ββββ | 583/1563 [06:30<11:35, 1.41it/s]
37%|ββββ | 584/1563 [06:30<10:25, 1.56it/s]
37%|ββββ | 585/1563 [06:31<11:24, 1.43it/s]
37%|ββββ | 586/1563 [06:32<11:40, 1.39it/s]
38%|ββββ | 587/1563 [06:32<10:20, 1.57it/s]
38%|ββββ | 588/1563 [06:33<10:40, 1.52it/s]
38%|ββββ | 589/1563 [06:34<10:56, 1.48it/s]
38%|ββββ | 590/1563 [06:34<10:28, 1.55it/s]
38%|ββββ | 591/1563 [06:35<09:35, 1.69it/s]
38%|ββββ | 592/1563 [06:35<10:33, 1.53it/s]
38%|ββββ | 593/1563 [06:36<11:26, 1.41it/s]
38%|ββββ | 594/1563 [06:37<11:16, 1.43it/s]
38%|ββββ | 595/1563 [06:37<10:10, 1.59it/s]
38%|ββββ | 596/1563 [06:38<11:18, 1.43it/s]
38%|ββββ | 597/1563 [06:39<11:19, 1.42it/s]
38%|ββββ | 598/1563 [06:40<11:15, 1.43it/s]
38%|ββββ | 599/1563 [06:40<10:47, 1.49it/s]
38%|ββββ | 600/1563 [06:41<10:46, 1.49it/s]
{'loss': 0.1441, 'grad_norm': 12.9375, 'learning_rate': 1.233525271912988e-05, 'epoch': 0.38}
38%|ββββ | 600/1563 [06:41<10:46, 1.49it/s]
38%|ββββ | 601/1563 [06:42<10:20, 1.55it/s]
39%|ββββ | 602/1563 [06:42<10:39, 1.50it/s]
39%|ββββ | 603/1563 [06:43<11:30, 1.39it/s]
39%|ββββ | 604/1563 [06:44<10:25, 1.53it/s]
39%|ββββ | 605/1563 [06:44<09:32, 1.67it/s]
39%|ββββ | 606/1563 [06:45<09:12, 1.73it/s]
39%|ββββ | 607/1563 [06:45<10:15, 1.55it/s]
39%|ββββ | 608/1563 [06:46<09:38, 1.65it/s]
39%|ββββ | 609/1563 [06:46<08:56, 1.78it/s]
39%|ββββ | 610/1563 [06:47<08:35, 1.85it/s]
39%|ββββ | 611/1563 [06:47<08:37, 1.84it/s]
39%|ββββ | 612/1563 [06:48<08:20, 1.90it/s]
39%|ββββ | 613/1563 [06:49<08:40, 1.83it/s]
39%|ββββ | 614/1563 [06:49<09:41, 1.63it/s]
39%|ββββ | 615/1563 [06:50<09:58, 1.58it/s]
39%|ββββ | 616/1563 [06:51<10:42, 1.47it/s]
39%|ββββ | 617/1563 [06:51<09:47, 1.61it/s]
40%|ββββ | 618/1563 [06:52<10:25, 1.51it/s]
40%|ββββ | 619/1563 [06:53<10:27, 1.51it/s]
40%|ββββ | 620/1563 [06:53<09:28, 1.66it/s]
40%|ββββ | 621/1563 [06:54<09:50, 1.59it/s]
40%|ββββ | 622/1563 [06:55<11:01, 1.42it/s]
40%|ββββ | 623/1563 [06:55<09:59, 1.57it/s]
40%|ββββ | 624/1563 [06:56<09:58, 1.57it/s]
40%|ββββ | 625/1563 [06:57<10:15, 1.53it/s]
40%|ββββ | 626/1563 [06:57<10:34, 1.48it/s]
40%|ββββ | 627/1563 [06:58<10:25, 1.50it/s]
40%|ββββ | 628/1563 [06:59<11:19, 1.38it/s]
40%|ββββ | 629/1563 [06:59<10:25, 1.49it/s]
40%|ββββ | 630/1563 [07:00<11:17, 1.38it/s]
40%|ββββ | 631/1563 [07:01<10:28, 1.48it/s]
40%|ββββ | 632/1563 [07:01<09:21, 1.66it/s]
40%|ββββ | 633/1563 [07:02<10:06, 1.53it/s]
41%|ββββ | 634/1563 [07:02<09:23, 1.65it/s]
41%|ββββ | 635/1563 [07:03<09:06, 1.70it/s]
41%|ββββ | 636/1563 [07:03<08:36, 1.79it/s]
41%|ββββ | 637/1563 [07:04<10:00, 1.54it/s]
41%|ββββ | 638/1563 [07:05<10:37, 1.45it/s]
41%|ββββ | 639/1563 [07:06<09:51, 1.56it/s]
41%|ββββ | 640/1563 [07:06<09:47, 1.57it/s]
41%|ββββ | 641/1563 [07:07<09:04, 1.69it/s]
41%|ββββ | 642/1563 [07:07<09:35, 1.60it/s]
41%|ββββ | 643/1563 [07:08<09:14, 1.66it/s]
41%|ββββ | 644/1563 [07:09<09:42, 1.58it/s]
41%|βββββ | 645/1563 [07:10<10:44, 1.42it/s]
41%|βββββ | 646/1563 [07:10<09:49, 1.56it/s]
41%|βββββ | 647/1563 [07:11<09:37, 1.59it/s]
41%|βββββ | 648/1563 [07:12<10:40, 1.43it/s]
42%|βββββ | 649/1563 [07:12<11:21, 1.34it/s]
42%|βββββ | 650/1563 [07:13<10:52, 1.40it/s]
{'loss': 0.1872, 'grad_norm': 23.0, 'learning_rate': 1.1695457453614845e-05, 'epoch': 0.42}
42%|βββββ | 650/1563 [07:13<10:52, 1.40it/s]
42%|βββββ | 651/1563 [07:14<09:55, 1.53it/s]
42%|βββββ | 652/1563 [07:14<08:53, 1.71it/s]
42%|βββββ | 653/1563 [07:15<10:06, 1.50it/s]
42%|βββββ | 654/1563 [07:15<09:14, 1.64it/s]
42%|βββββ | 655/1563 [07:16<08:38, 1.75it/s]
42%|βββββ | 656/1563 [07:17<09:55, 1.52it/s]
42%|βββββ | 657/1563 [07:17<10:19, 1.46it/s]
42%|βββββ | 658/1563 [07:18<11:02, 1.37it/s]
42%|βββββ | 659/1563 [07:19<11:41, 1.29it/s]
42%|βββββ | 660/1563 [07:20<10:26, 1.44it/s]
42%|βββββ | 661/1563 [07:20<10:40, 1.41it/s]
42%|βββββ | 662/1563 [07:21<09:19, 1.61it/s]
42%|βββββ | 663/1563 [07:21<09:34, 1.57it/s]
42%|βββββ | 664/1563 [07:22<09:48, 1.53it/s]
43%|βββββ | 665/1563 [07:23<09:10, 1.63it/s]
43%|βββββ | 666/1563 [07:23<08:53, 1.68it/s]
43%|βββββ | 667/1563 [07:24<08:22, 1.78it/s]
43%|βββββ | 668/1563 [07:24<09:24, 1.59it/s]
43%|βββββ | 669/1563 [07:25<08:44, 1.70it/s]
43%|βββββ | 670/1563 [07:25<08:02, 1.85it/s]
43%|βββββ | 671/1563 [07:26<07:39, 1.94it/s]
43%|βββββ | 672/1563 [07:26<07:15, 2.04it/s]
43%|βββββ | 673/1563 [07:27<07:24, 2.00it/s]
43%|βββββ | 674/1563 [07:28<08:56, 1.66it/s]
43%|βββββ | 675/1563 [07:28<09:45, 1.52it/s]
43%|βββββ | 676/1563 [07:29<09:01, 1.64it/s]
43%|βββββ | 677/1563 [07:30<09:13, 1.60it/s]
43%|βββββ | 678/1563 [07:30<08:19, 1.77it/s]
43%|βββββ | 679/1563 [07:31<08:45, 1.68it/s]
44%|βββββ | 680/1563 [07:31<09:09, 1.61it/s]
44%|βββββ | 681/1563 [07:32<08:09, 1.80it/s]
44%|βββββ | 682/1563 [07:32<07:43, 1.90it/s]
44%|βββββ | 683/1563 [07:33<08:21, 1.76it/s]
44%|βββββ | 684/1563 [07:34<08:38, 1.69it/s]
44%|βββββ | 685/1563 [07:34<09:01, 1.62it/s]
44%|βββββ | 686/1563 [07:35<08:16, 1.77it/s]
44%|βββββ | 687/1563 [07:35<08:04, 1.81it/s]
44%|βββββ | 688/1563 [07:36<08:43, 1.67it/s]
44%|βββββ | 689/1563 [07:37<09:25, 1.55it/s]
44%|βββββ | 690/1563 [07:38<10:21, 1.41it/s]
44%|βββββ | 691/1563 [07:38<10:58, 1.32it/s]
44%|βββββ | 692/1563 [07:39<09:57, 1.46it/s]
44%|βββββ | 693/1563 [07:39<09:03, 1.60it/s]
44%|βββββ | 694/1563 [07:40<08:21, 1.73it/s]
44%|βββββ | 695/1563 [07:40<08:14, 1.76it/s]
45%|βββββ | 696/1563 [07:41<07:46, 1.86it/s]
45%|βββββ | 697/1563 [07:41<07:29, 1.93it/s]
45%|βββββ | 698/1563 [07:42<07:40, 1.88it/s]
45%|βββββ | 699/1563 [07:42<07:29, 1.92it/s]
45%|βββββ | 700/1563 [07:43<07:07, 2.02it/s]
{'loss': 0.1618, 'grad_norm': 16.625, 'learning_rate': 1.105566218809981e-05, 'epoch': 0.45}
45%|βββββ | 700/1563 [07:43<07:07, 2.02it/s]
45%|βββββ | 701/1563 [07:44<08:45, 1.64it/s]
45%|βββββ | 702/1563 [07:44<08:32, 1.68it/s]
45%|βββββ | 703/1563 [07:45<07:54, 1.81it/s]
45%|βββββ | 704/1563 [07:45<08:53, 1.61it/s]
45%|βββββ | 705/1563 [07:46<08:14, 1.73it/s]
45%|βββββ | 706/1563 [07:46<07:43, 1.85it/s]
45%|βββββ | 707/1563 [07:47<08:44, 1.63it/s]
45%|βββββ | 708/1563 [07:48<08:04, 1.76it/s]
45%|βββββ | 709/1563 [07:48<07:37, 1.87it/s]
45%|βββββ | 710/1563 [07:49<08:14, 1.72it/s]
45%|βββββ | 711/1563 [07:50<08:49, 1.61it/s]
46%|βββββ | 712/1563 [07:50<08:43, 1.63it/s]
46%|βββββ | 713/1563 [07:51<09:29, 1.49it/s]
46%|βββββ | 714/1563 [07:51<08:29, 1.67it/s]
46%|βββββ | 715/1563 [07:52<08:45, 1.61it/s]
46%|βββββ | 716/1563 [07:52<08:01, 1.76it/s]
46%|βββββ | 717/1563 [07:53<09:02, 1.56it/s]
46%|βββββ | 718/1563 [07:54<08:42, 1.62it/s]
46%|βββββ | 719/1563 [07:54<08:00, 1.76it/s]
46%|βββββ | 720/1563 [07:55<07:51, 1.79it/s]
46%|βββββ | 721/1563 [07:56<08:49, 1.59it/s]
46%|βββββ | 722/1563 [07:56<09:32, 1.47it/s]
46%|βββββ | 723/1563 [07:57<10:14, 1.37it/s]
46%|βββββ | 724/1563 [07:58<09:01, 1.55it/s]
46%|βββββ | 725/1563 [07:59<09:51, 1.42it/s]
46%|βββββ | 726/1563 [07:59<09:37, 1.45it/s]
47%|βββββ | 727/1563 [08:00<09:37, 1.45it/s]
47%|βββββ | 728/1563 [08:00<08:38, 1.61it/s]
47%|βββββ | 729/1563 [08:01<08:54, 1.56it/s]
47%|βββββ | 730/1563 [08:02<09:17, 1.49it/s]
47%|βββββ | 731/1563 [08:02<08:39, 1.60it/s]
47%|βββββ | 732/1563 [08:03<09:29, 1.46it/s]
47%|βββββ | 733/1563 [08:04<08:56, 1.55it/s]
47%|βββββ | 734/1563 [08:04<08:14, 1.68it/s]
47%|βββββ | 735/1563 [08:05<09:16, 1.49it/s]
47%|βββββ | 736/1563 [08:06<08:31, 1.62it/s]
47%|βββββ | 737/1563 [08:06<09:10, 1.50it/s]
47%|βββββ | 738/1563 [08:07<08:32, 1.61it/s]
47%|βββββ | 739/1563 [08:07<08:03, 1.70it/s]
47%|βββββ | 740/1563 [08:08<08:58, 1.53it/s]
47%|βββββ | 741/1563 [08:09<09:29, 1.44it/s]
47%|βββββ | 742/1563 [08:10<09:52, 1.38it/s]
48%|βββββ | 743/1563 [08:11<10:29, 1.30it/s]
48%|βββββ | 744/1563 [08:11<09:26, 1.45it/s]
48%|βββββ | 745/1563 [08:12<10:10, 1.34it/s]
48%|βββββ | 746/1563 [08:12<09:01, 1.51it/s]
48%|βββββ | 747/1563 [08:13<09:05, 1.50it/s]
48%|βββββ | 748/1563 [08:14<09:55, 1.37it/s]
48%|βββββ | 749/1563 [08:14<08:51, 1.53it/s]
48%|βββββ | 750/1563 [08:15<08:16, 1.64it/s]
{'loss': 0.1435, 'grad_norm': 16.0, 'learning_rate': 1.0415866922584774e-05, 'epoch': 0.48}
48%|βββββ | 750/1563 [08:15<08:16, 1.64it/s]
48%|βββββ | 751/1563 [08:16<08:51, 1.53it/s]
48%|βββββ | 752/1563 [08:16<07:55, 1.70it/s]
48%|βββββ | 753/1563 [08:17<08:10, 1.65it/s]
48%|βββββ | 754/1563 [08:18<09:03, 1.49it/s]
48%|βββββ | 755/1563 [08:18<09:04, 1.48it/s]
48%|βββββ | 756/1563 [08:19<09:52, 1.36it/s]
48%|βββββ | 757/1563 [08:20<08:52, 1.51it/s]
48%|βββββ | 758/1563 [08:20<09:10, 1.46it/s]
49%|βββββ | 759/1563 [08:21<09:37, 1.39it/s]
49%|βββββ | 760/1563 [08:22<08:37, 1.55it/s]
49%|βββββ | 761/1563 [08:23<09:24, 1.42it/s]
49%|βββββ | 762/1563 [08:23<08:16, 1.61it/s]
49%|βββββ | 763/1563 [08:23<07:41, 1.73it/s]
49%|βββββ | 764/1563 [08:24<08:29, 1.57it/s]
49%|βββββ | 765/1563 [08:25<09:21, 1.42it/s]
49%|βββββ | 766/1563 [08:26<09:58, 1.33it/s]
49%|βββββ | 767/1563 [08:27<10:19, 1.28it/s]
49%|βββββ | 768/1563 [08:27<09:46, 1.36it/s]
49%|βββββ | 769/1563 [08:28<08:46, 1.51it/s]
49%|βββββ | 770/1563 [08:28<07:57, 1.66it/s]
49%|βββββ | 771/1563 [08:29<08:52, 1.49it/s]
49%|βββββ | 772/1563 [08:30<09:05, 1.45it/s]
49%|βββββ | 773/1563 [08:31<09:27, 1.39it/s]
50%|βββββ | 774/1563 [08:31<09:18, 1.41it/s]
50%|βββββ | 775/1563 [08:32<08:19, 1.58it/s]
50%|βββββ | 776/1563 [08:33<08:47, 1.49it/s]
50%|βββββ | 777/1563 [08:33<08:36, 1.52it/s]
50%|βββββ | 778/1563 [08:34<08:55, 1.47it/s]
50%|βββββ | 779/1563 [08:35<09:34, 1.36it/s]
50%|βββββ | 780/1563 [08:35<08:46, 1.49it/s]
50%|βββββ | 781/1563 [08:36<08:51, 1.47it/s]
50%|βββββ | 782/1563 [08:37<07:59, 1.63it/s]
50%|βββββ | 783/1563 [08:37<07:39, 1.70it/s]
50%|βββββ | 784/1563 [08:38<07:41, 1.69it/s]
50%|βββββ | 785/1563 [08:38<08:01, 1.62it/s]
50%|βββββ | 786/1563 [08:39<07:24, 1.75it/s]
50%|βββββ | 787/1563 [08:40<08:15, 1.57it/s]
50%|βββββ | 788/1563 [08:40<07:32, 1.71it/s]
50%|βββββ | 789/1563 [08:41<07:52, 1.64it/s]
51%|βββββ | 790/1563 [08:41<07:15, 1.77it/s]
51%|βββββ | 791/1563 [08:42<07:17, 1.77it/s]
51%|βββββ | 792/1563 [08:42<07:30, 1.71it/s]
51%|βββββ | 793/1563 [08:43<06:54, 1.86it/s]
51%|βββββ | 794/1563 [08:44<07:46, 1.65it/s]
51%|βββββ | 795/1563 [08:44<07:07, 1.80it/s]
51%|βββββ | 796/1563 [08:45<08:13, 1.55it/s]
51%|βββββ | 797/1563 [08:46<09:02, 1.41it/s]
51%|βββββ | 798/1563 [08:46<08:12, 1.55it/s]
51%|βββββ | 799/1563 [08:47<09:00, 1.41it/s]
51%|βββββ | 800/1563 [08:48<09:27, 1.34it/s]
{'loss': 0.1442, 'grad_norm': 1.359375, 'learning_rate': 9.776071657069739e-06, 'epoch': 0.51}
51%|βββββ | 800/1563 [08:48<09:27, 1.34it/s]
51%|βββββ | 801/1563 [08:49<09:30, 1.34it/s]
51%|ββββββ | 802/1563 [08:49<09:12, 1.38it/s]
51%|ββββββ | 803/1563 [08:50<08:01, 1.58it/s]
51%|ββββββ | 804/1563 [08:51<08:58, 1.41it/s]
52%|ββββββ | 805/1563 [08:51<09:24, 1.34it/s]
52%|ββββββ | 806/1563 [08:52<09:22, 1.35it/s]
52%|ββββββ | 807/1563 [08:53<08:33, 1.47it/s]
52%|ββββββ | 808/1563 [08:53<07:47, 1.61it/s]
52%|ββββββ | 809/1563 [08:54<08:37, 1.46it/s]
52%|ββββββ | 810/1563 [08:55<09:07, 1.38it/s]
52%|ββββββ | 811/1563 [08:55<07:59, 1.57it/s]
52%|ββββββ | 812/1563 [08:56<08:13, 1.52it/s]
52%|ββββββ | 813/1563 [08:57<08:36, 1.45it/s]
52%|ββββββ | 814/1563 [08:57<08:31, 1.46it/s]
52%|ββββββ | 815/1563 [08:58<09:05, 1.37it/s]
52%|ββββββ | 816/1563 [08:59<09:21, 1.33it/s]
52%|ββββββ | 817/1563 [09:00<09:04, 1.37it/s]
52%|ββββββ | 818/1563 [09:00<08:58, 1.38it/s]
52%|ββββββ | 819/1563 [09:01<07:51, 1.58it/s]
52%|ββββββ | 820/1563 [09:01<07:42, 1.61it/s]
53%|ββββββ | 821/1563 [09:02<07:15, 1.70it/s]
53%|ββββββ | 822/1563 [09:02<06:44, 1.83it/s]
53%|ββββββ | 823/1563 [09:03<07:52, 1.57it/s]
53%|ββββββ | 824/1563 [09:04<08:42, 1.41it/s]
53%|ββββββ | 825/1563 [09:05<08:46, 1.40it/s]
53%|ββββββ | 826/1563 [09:05<08:17, 1.48it/s]
53%|ββββββ | 827/1563 [09:06<07:30, 1.64it/s]
53%|ββββββ | 828/1563 [09:07<08:13, 1.49it/s]
53%|ββββββ | 829/1563 [09:08<08:40, 1.41it/s]
53%|ββββββ | 830/1563 [09:08<08:19, 1.47it/s]
53%|ββββββ | 831/1563 [09:09<08:19, 1.46it/s]
53%|ββββββ | 832/1563 [09:09<07:37, 1.60it/s]
53%|ββββββ | 833/1563 [09:10<07:50, 1.55it/s]
53%|ββββββ | 834/1563 [09:10<07:03, 1.72it/s]
53%|ββββββ | 835/1563 [09:11<07:14, 1.68it/s]
53%|ββββββ | 836/1563 [09:12<08:08, 1.49it/s]
54%|ββββββ | 837/1563 [09:13<08:08, 1.49it/s]
54%|ββββββ | 838/1563 [09:13<08:21, 1.44it/s]
54%|ββββββ | 839/1563 [09:14<08:51, 1.36it/s]
54%|ββββββ | 840/1563 [09:15<08:45, 1.37it/s]
54%|ββββββ | 841/1563 [09:16<09:04, 1.33it/s]
54%|ββββββ | 842/1563 [09:17<09:27, 1.27it/s]
54%|ββββββ | 843/1563 [09:17<09:41, 1.24it/s]
54%|ββββββ | 844/1563 [09:18<09:18, 1.29it/s]
54%|ββββββ | 845/1563 [09:19<08:53, 1.35it/s]
54%|ββββββ | 846/1563 [09:19<08:22, 1.43it/s]
54%|ββββββ | 847/1563 [09:20<07:29, 1.59it/s]
54%|ββββββ | 848/1563 [09:21<08:12, 1.45it/s]
54%|ββββββ | 849/1563 [09:22<08:42, 1.37it/s]
54%|ββββββ | 850/1563 [09:22<07:55, 1.50it/s]
{'loss': 0.1416, 'grad_norm': 1.0859375, 'learning_rate': 9.136276391554704e-06, 'epoch': 0.54}
54%|ββββββ | 850/1563 [09:22<07:55, 1.50it/s]
54%|ββββββ | 851/1563 [09:23<08:05, 1.47it/s]
55%|ββββββ | 852/1563 [09:23<07:20, 1.62it/s]
55%|ββββββ | 853/1563 [09:24<07:00, 1.69it/s]
55%|ββββββ | 854/1563 [09:24<06:31, 1.81it/s]
55%|ββββββ | 855/1563 [09:25<07:20, 1.61it/s]
55%|ββββββ | 856/1563 [09:26<07:03, 1.67it/s]
55%|ββββββ | 857/1563 [09:26<06:33, 1.79it/s]
55%|ββββββ | 858/1563 [09:27<07:33, 1.55it/s]
55%|ββββββ | 859/1563 [09:28<07:41, 1.53it/s]
55%|ββββββ | 860/1563 [09:28<07:04, 1.66it/s]
55%|ββββββ | 861/1563 [09:29<07:14, 1.62it/s]
55%|ββββββ | 862/1563 [09:29<06:28, 1.81it/s]
55%|ββββββ | 863/1563 [09:30<06:04, 1.92it/s]
55%|ββββββ | 864/1563 [09:30<06:33, 1.78it/s]
55%|ββββββ | 865/1563 [09:31<07:24, 1.57it/s]
55%|ββββββ | 866/1563 [09:31<06:43, 1.73it/s]
55%|ββββββ | 867/1563 [09:32<07:26, 1.56it/s]
56%|ββββββ | 868/1563 [09:33<07:24, 1.56it/s]
56%|ββββββ | 869/1563 [09:33<06:48, 1.70it/s]
56%|ββββββ | 870/1563 [09:34<06:22, 1.81it/s]
56%|ββββββ | 871/1563 [09:34<06:05, 1.89it/s]
56%|ββββββ | 872/1563 [09:35<07:09, 1.61it/s]
56%|ββββββ | 873/1563 [09:36<06:40, 1.72it/s]
56%|ββββββ | 874/1563 [09:36<06:54, 1.66it/s]
56%|ββββββ | 875/1563 [09:37<07:20, 1.56it/s]
56%|ββββββ | 876/1563 [09:38<07:35, 1.51it/s]
56%|ββββββ | 877/1563 [09:38<07:23, 1.55it/s]
56%|ββββββ | 878/1563 [09:39<06:46, 1.68it/s]
56%|ββββββ | 879/1563 [09:39<07:08, 1.60it/s]
56%|ββββββ | 880/1563 [09:40<07:45, 1.47it/s]
56%|ββββββ | 881/1563 [09:41<06:54, 1.65it/s]
56%|ββββββ | 882/1563 [09:42<07:47, 1.46it/s]
56%|ββββββ | 883/1563 [09:42<07:02, 1.61it/s]
57%|ββββββ | 884/1563 [09:43<07:49, 1.45it/s]
57%|ββββββ | 885/1563 [09:44<07:44, 1.46it/s]
57%|ββββββ | 886/1563 [09:44<07:30, 1.50it/s]
57%|ββββββ | 887/1563 [09:45<06:42, 1.68it/s]
57%|ββββββ | 888/1563 [09:45<07:03, 1.59it/s]
57%|ββββββ | 889/1563 [09:46<07:25, 1.51it/s]
57%|ββββββ | 890/1563 [09:47<07:57, 1.41it/s]
57%|ββββββ | 891/1563 [09:47<07:22, 1.52it/s]
57%|ββββββ | 892/1563 [09:48<06:45, 1.66it/s]
57%|ββββββ | 893/1563 [09:48<06:24, 1.74it/s]
57%|ββββββ | 894/1563 [09:49<06:08, 1.82it/s]
57%|ββββββ | 895/1563 [09:50<07:10, 1.55it/s]
57%|ββββββ | 896/1563 [09:50<06:50, 1.63it/s]
57%|ββββββ | 897/1563 [09:51<06:38, 1.67it/s]
57%|ββββββ | 898/1563 [09:51<06:20, 1.75it/s]
58%|ββββββ | 899/1563 [09:52<06:25, 1.72it/s]
58%|ββββββ | 900/1563 [09:52<06:03, 1.83it/s]
{'loss': 0.1478, 'grad_norm': 5.09375, 'learning_rate': 8.496481126039668e-06, 'epoch': 0.58}
58%|ββββββ | 900/1563 [09:53<06:03, 1.83it/s]
58%|ββββββ | 901/1563 [09:53<05:52, 1.88it/s]
58%|ββββββ | 902/1563 [09:54<06:17, 1.75it/s]
58%|ββββββ | 903/1563 [09:54<06:01, 1.83it/s]
58%|ββββββ | 904/1563 [09:55<07:03, 1.56it/s]
58%|ββββββ | 905/1563 [09:55<06:29, 1.69it/s]
58%|ββββββ | 906/1563 [09:56<06:09, 1.78it/s]
58%|ββββββ | 907/1563 [09:57<06:46, 1.61it/s]
58%|ββββββ | 908/1563 [09:57<06:19, 1.73it/s]
58%|ββββββ | 909/1563 [09:58<06:40, 1.63it/s]
58%|ββββββ | 910/1563 [09:58<06:14, 1.75it/s]
58%|ββββββ | 911/1563 [09:59<06:05, 1.78it/s]
58%|ββββββ | 912/1563 [10:00<06:28, 1.68it/s]
58%|ββββββ | 913/1563 [10:00<06:12, 1.74it/s]
58%|ββββββ | 914/1563 [10:01<06:31, 1.66it/s]
59%|ββββββ | 915/1563 [10:01<06:19, 1.71it/s]
59%|ββββββ | 916/1563 [10:02<06:11, 1.74it/s]
59%|ββββββ | 917/1563 [10:02<05:45, 1.87it/s]
59%|ββββββ | 918/1563 [10:03<05:36, 1.92it/s]
59%|ββββββ | 919/1563 [10:04<06:17, 1.70it/s]
59%|ββββββ | 920/1563 [10:04<06:56, 1.55it/s]
59%|ββββββ | 921/1563 [10:05<07:02, 1.52it/s]
59%|ββββββ | 922/1563 [10:06<07:04, 1.51it/s]
59%|ββββββ | 923/1563 [10:06<07:23, 1.44it/s]
59%|ββββββ | 924/1563 [10:07<07:12, 1.48it/s]
59%|ββββββ | 925/1563 [10:08<06:29, 1.64it/s]
59%|ββββββ | 926/1563 [10:08<07:05, 1.50it/s]
59%|ββββββ | 927/1563 [10:09<07:20, 1.44it/s]
59%|ββββββ | 928/1563 [10:10<06:42, 1.58it/s]
59%|ββββββ | 929/1563 [10:10<06:10, 1.71it/s]
60%|ββββββ | 930/1563 [10:10<05:44, 1.84it/s]
60%|ββββββ | 931/1563 [10:11<05:30, 1.91it/s]
60%|ββββββ | 932/1563 [10:11<05:13, 2.01it/s]
60%|ββββββ | 933/1563 [10:12<05:47, 1.81it/s]
60%|ββββββ | 934/1563 [10:13<06:41, 1.56it/s]
60%|ββββββ | 935/1563 [10:13<06:17, 1.66it/s]
60%|ββββββ | 936/1563 [10:14<07:04, 1.48it/s]
60%|ββββββ | 937/1563 [10:15<06:19, 1.65it/s]
60%|ββββββ | 938/1563 [10:16<06:57, 1.50it/s]
60%|ββββββ | 939/1563 [10:16<07:07, 1.46it/s]
60%|ββββββ | 940/1563 [10:17<06:38, 1.56it/s]
60%|ββββββ | 941/1563 [10:17<05:58, 1.73it/s]
60%|ββββββ | 942/1563 [10:18<06:34, 1.57it/s]
60%|ββββββ | 943/1563 [10:19<06:54, 1.50it/s]
60%|ββββββ | 944/1563 [10:19<06:49, 1.51it/s]
60%|ββββββ | 945/1563 [10:20<06:18, 1.63it/s]
61%|ββββββ | 946/1563 [10:20<05:44, 1.79it/s]
61%|ββββββ | 947/1563 [10:21<06:04, 1.69it/s]
61%|ββββββ | 948/1563 [10:21<05:31, 1.86it/s]
61%|ββββββ | 949/1563 [10:22<05:19, 1.92it/s]
61%|ββββββ | 950/1563 [10:22<05:15, 1.94it/s]
{'loss': 0.1462, 'grad_norm': 27.75, 'learning_rate': 7.856685860524633e-06, 'epoch': 0.61}
61%|ββββββ | 950/1563 [10:22<05:15, 1.94it/s]
61%|ββββββ | 951/1563 [10:23<06:16, 1.63it/s]
61%|ββββββ | 952/1563 [10:24<05:56, 1.71it/s]
61%|ββββββ | 953/1563 [10:24<05:36, 1.81it/s]
61%|ββββββ | 954/1563 [10:25<05:29, 1.85it/s]
61%|ββββββ | 955/1563 [10:26<06:08, 1.65it/s]
61%|ββββββ | 956/1563 [10:26<06:20, 1.59it/s]
61%|ββββββ | 957/1563 [10:27<06:33, 1.54it/s]
61%|βββββββ | 958/1563 [10:27<06:14, 1.62it/s]
61%|βββββββ | 959/1563 [10:28<06:08, 1.64it/s]
61%|βββββββ | 960/1563 [10:29<06:36, 1.52it/s]
61%|βββββββ | 961/1563 [10:29<06:41, 1.50it/s]
62%|βββββββ | 962/1563 [10:30<06:36, 1.52it/s]
62%|βββββββ | 963/1563 [10:31<06:15, 1.60it/s]
62%|βββββββ | 964/1563 [10:31<06:36, 1.51it/s]
62%|βββββββ | 965/1563 [10:32<06:22, 1.56it/s]
62%|βββββββ | 966/1563 [10:33<06:15, 1.59it/s]
62%|βββββββ | 967/1563 [10:33<06:32, 1.52it/s]
62%|βββββββ | 968/1563 [10:34<05:59, 1.65it/s]
62%|βββββββ | 969/1563 [10:35<06:41, 1.48it/s]
62%|βββββββ | 970/1563 [10:35<06:04, 1.63it/s]
62%|βββββββ | 971/1563 [10:36<06:34, 1.50it/s]
62%|βββββββ | 972/1563 [10:36<06:05, 1.62it/s]
62%|βββββββ | 973/1563 [10:37<06:50, 1.44it/s]
62%|βββββββ | 974/1563 [10:38<06:47, 1.45it/s]
62%|βββββββ | 975/1563 [10:39<07:15, 1.35it/s]
62%|βββββββ | 976/1563 [10:40<07:29, 1.30it/s]
63%|βββββββ | 977/1563 [10:40<07:16, 1.34it/s]
63%|βββββββ | 978/1563 [10:41<07:03, 1.38it/s]
63%|βββββββ | 979/1563 [10:42<07:03, 1.38it/s]
63%|βββββββ | 980/1563 [10:43<07:27, 1.30it/s]
63%|βββββββ | 981/1563 [10:43<07:39, 1.27it/s]
63%|βββββββ | 982/1563 [10:44<07:41, 1.26it/s]
63%|βββββββ | 983/1563 [10:45<07:51, 1.23it/s]
63%|βββββββ | 984/1563 [10:46<07:00, 1.38it/s]
63%|βββββββ | 985/1563 [10:46<06:14, 1.54it/s]
63%|βββββββ | 986/1563 [10:47<06:52, 1.40it/s]
63%|βββββββ | 987/1563 [10:48<06:53, 1.39it/s]
63%|βββββββ | 988/1563 [10:48<06:53, 1.39it/s]
63%|βββββββ | 989/1563 [10:49<06:43, 1.42it/s]
63%|βββββββ | 990/1563 [10:50<06:22, 1.50it/s]
63%|βββββββ | 991/1563 [10:50<06:46, 1.41it/s]
63%|βββββββ | 992/1563 [10:51<06:14, 1.53it/s]
64%|βββββββ | 993/1563 [10:52<06:04, 1.56it/s]
64%|βββββββ | 994/1563 [10:52<06:06, 1.55it/s]
64%|βββββββ | 995/1563 [10:53<05:51, 1.62it/s]
64%|βββββββ | 996/1563 [10:53<05:33, 1.70it/s]
64%|βββββββ | 997/1563 [10:54<05:15, 1.80it/s]
64%|βββββββ | 998/1563 [10:55<06:05, 1.54it/s]
64%|βββββββ | 999/1563 [10:55<05:53, 1.60it/s]
64%|βββββββ | 1000/1563 [10:56<06:25, 1.46it/s]
{'loss': 0.1499, 'grad_norm': 0.921875, 'learning_rate': 7.216890595009598e-06, 'epoch': 0.64}
64%|βββββββ | 1000/1563 [10:56<06:25, 1.46it/s]
64%|βββββββ | 1001/1563 [10:57<06:18, 1.48it/s]
64%|βββββββ | 1002/1563 [10:57<06:20, 1.47it/s]
64%|βββββββ | 1003/1563 [10:58<06:31, 1.43it/s]
64%|βββββββ | 1004/1563 [10:59<06:33, 1.42it/s]
64%|βββββββ | 1005/1563 [10:59<06:01, 1.54it/s]
64%|βββββββ | 1006/1563 [11:00<06:42, 1.39it/s]
64%|βββββββ | 1007/1563 [11:01<07:06, 1.31it/s]
64%|βββββββ | 1008/1563 [11:02<07:16, 1.27it/s]
65%|βββββββ | 1009/1563 [11:03<06:51, 1.34it/s]
65%|βββββββ | 1010/1563 [11:04<07:09, 1.29it/s]
65%|βββββββ | 1011/1563 [11:04<07:22, 1.25it/s]
65%|βββββββ | 1012/1563 [11:05<06:26, 1.43it/s]
65%|βββββββ | 1013/1563 [11:06<06:29, 1.41it/s]
65%|βββββββ | 1014/1563 [11:06<05:42, 1.60it/s]
65%|βββββββ | 1015/1563 [11:06<05:21, 1.70it/s]
65%|βββββββ | 1016/1563 [11:07<05:07, 1.78it/s]
65%|βββββββ | 1017/1563 [11:07<04:50, 1.88it/s]
65%|βββββββ | 1018/1563 [11:08<05:20, 1.70it/s]
65%|βββββββ | 1019/1563 [11:09<05:31, 1.64it/s]
65%|βββββββ | 1020/1563 [11:09<05:06, 1.77it/s]
65%|βββββββ | 1021/1563 [11:10<05:30, 1.64it/s]
65%|βββββββ | 1022/1563 [11:11<05:14, 1.72it/s]
65%|βββββββ | 1023/1563 [11:11<04:50, 1.86it/s]
66%|βββββββ | 1024/1563 [11:11<04:45, 1.89it/s]
66%|βββββββ | 1025/1563 [11:12<05:09, 1.74it/s]
66%|βββββββ | 1026/1563 [11:13<04:52, 1.83it/s]
66%|βββββββ | 1027/1563 [11:13<05:37, 1.59it/s]
66%|βββββββ | 1028/1563 [11:14<06:13, 1.43it/s]
66%|βββββββ | 1029/1563 [11:15<06:31, 1.36it/s]
66%|βββββββ | 1030/1563 [11:16<06:24, 1.38it/s]
66%|βββββββ | 1031/1563 [11:17<06:19, 1.40it/s]
66%|βββββββ | 1032/1563 [11:17<05:35, 1.58it/s]
66%|βββββββ | 1033/1563 [11:18<05:39, 1.56it/s]
66%|βββββββ | 1034/1563 [11:18<06:15, 1.41it/s]
66%|βββββββ | 1035/1563 [11:19<06:14, 1.41it/s]
66%|βββββββ | 1036/1563 [11:20<05:33, 1.58it/s]
66%|βββββββ | 1037/1563 [11:20<05:44, 1.53it/s]
66%|βββββββ | 1038/1563 [11:21<06:09, 1.42it/s]
66%|βββββββ | 1039/1563 [11:22<05:44, 1.52it/s]
67%|βββββββ | 1040/1563 [11:23<06:15, 1.39it/s]
67%|βββββββ | 1041/1563 [11:23<05:40, 1.53it/s]
67%|βββββββ | 1042/1563 [11:24<06:04, 1.43it/s]
67%|βββββββ | 1043/1563 [11:24<05:33, 1.56it/s]
67%|βββββββ | 1044/1563 [11:25<06:07, 1.41it/s]
67%|βββββββ | 1045/1563 [11:26<06:12, 1.39it/s]
67%|βββββββ | 1046/1563 [11:26<05:27, 1.58it/s]
67%|βββββββ | 1047/1563 [11:27<05:57, 1.44it/s]
67%|βββββββ | 1048/1563 [11:28<05:17, 1.62it/s]
67%|βββββββ | 1049/1563 [11:28<04:51, 1.77it/s]
67%|βββββββ | 1050/1563 [11:29<04:28, 1.91it/s]
{'loss': 0.1438, 'grad_norm': 13.25, 'learning_rate': 6.577095329494563e-06, 'epoch': 0.67}
67%|βββββββ | 1050/1563 [11:29<04:28, 1.91it/s]
67%|βββββββ | 1051/1563 [11:29<04:18, 1.98it/s]
67%|βββββββ | 1052/1563 [11:30<04:14, 2.01it/s]
67%|βββββββ | 1053/1563 [11:30<05:19, 1.60it/s]
67%|βββββββ | 1054/1563 [11:31<05:51, 1.45it/s]
67%|βββββββ | 1055/1563 [11:32<06:14, 1.36it/s]
68%|βββββββ | 1056/1563 [11:33<06:18, 1.34it/s]
68%|βββββββ | 1057/1563 [11:34<06:13, 1.35it/s]
68%|βββββββ | 1058/1563 [11:34<06:28, 1.30it/s]
68%|βββββββ | 1059/1563 [11:35<06:41, 1.26it/s]
68%|βββββββ | 1060/1563 [11:36<06:28, 1.29it/s]
68%|βββββββ | 1061/1563 [11:37<06:31, 1.28it/s]
68%|βββββββ | 1062/1563 [11:38<06:34, 1.27it/s]
68%|βββββββ | 1063/1563 [11:38<06:37, 1.26it/s]
68%|βββββββ | 1064/1563 [11:39<05:58, 1.39it/s]
68%|βββββββ | 1065/1563 [11:40<06:18, 1.31it/s]
68%|βββββββ | 1066/1563 [11:40<05:30, 1.50it/s]
68%|βββββββ | 1067/1563 [11:41<05:45, 1.44it/s]
68%|βββββββ | 1068/1563 [11:42<05:45, 1.43it/s]
68%|βββββββ | 1069/1563 [11:43<06:04, 1.36it/s]
68%|βββββββ | 1070/1563 [11:43<06:20, 1.29it/s]
69%|βββββββ | 1071/1563 [11:44<05:32, 1.48it/s]
69%|βββββββ | 1072/1563 [11:45<05:59, 1.36it/s]
69%|βββββββ | 1073/1563 [11:46<06:18, 1.30it/s]
69%|βββββββ | 1074/1563 [11:46<05:28, 1.49it/s]
69%|βββββββ | 1075/1563 [11:47<05:52, 1.38it/s]
69%|βββββββ | 1076/1563 [11:48<06:06, 1.33it/s]
69%|βββββββ | 1077/1563 [11:48<05:45, 1.41it/s]
69%|βββββββ | 1078/1563 [11:49<05:37, 1.44it/s]
69%|βββββββ | 1079/1563 [11:50<05:55, 1.36it/s]
69%|βββββββ | 1080/1563 [11:50<05:31, 1.46it/s]
69%|βββββββ | 1081/1563 [11:51<05:58, 1.34it/s]
69%|βββββββ | 1082/1563 [11:52<06:16, 1.28it/s]
69%|βββββββ | 1083/1563 [11:53<05:57, 1.34it/s]
69%|βββββββ | 1084/1563 [11:54<06:07, 1.30it/s]
69%|βββββββ | 1085/1563 [11:54<05:44, 1.39it/s]
69%|βββββββ | 1086/1563 [11:55<05:18, 1.50it/s]
70%|βββββββ | 1087/1563 [11:55<05:18, 1.50it/s]
70%|βββββββ | 1088/1563 [11:56<04:56, 1.60it/s]
70%|βββββββ | 1089/1563 [11:56<04:36, 1.72it/s]
70%|βββββββ | 1090/1563 [11:57<05:12, 1.51it/s]
70%|βββββββ | 1091/1563 [11:58<05:40, 1.39it/s]
70%|βββββββ | 1092/1563 [11:59<05:11, 1.51it/s]
70%|βββββββ | 1093/1563 [11:59<05:08, 1.52it/s]
70%|βββββββ | 1094/1563 [12:00<05:35, 1.40it/s]
70%|βββββββ | 1095/1563 [12:01<04:58, 1.57it/s]
70%|βββββββ | 1096/1563 [12:01<04:30, 1.73it/s]
70%|βββββββ | 1097/1563 [12:02<04:39, 1.67it/s]
70%|βββββββ | 1098/1563 [12:03<05:12, 1.49it/s]
70%|βββββββ | 1099/1563 [12:03<05:29, 1.41it/s]
70%|βββββββ | 1100/1563 [12:04<05:33, 1.39it/s]
{'loss': 0.1419, 'grad_norm': 9.1875, 'learning_rate': 5.937300063979527e-06, 'epoch': 0.7}
70%|βββββββ | 1100/1563 [12:04<05:33, 1.39it/s]
70%|βββββββ | 1101/1563 [12:05<05:09, 1.49it/s]
71%|βββββββ | 1102/1563 [12:06<05:32, 1.39it/s]
71%|βββββββ | 1103/1563 [12:06<05:05, 1.51it/s]
71%|βββββββ | 1104/1563 [12:07<04:39, 1.64it/s]
71%|βββββββ | 1105/1563 [12:07<04:25, 1.72it/s]
71%|βββββββ | 1106/1563 [12:08<05:03, 1.51it/s]
71%|βββββββ | 1107/1563 [12:09<05:04, 1.50it/s]
71%|βββββββ | 1108/1563 [12:09<05:17, 1.43it/s]
71%|βββββββ | 1109/1563 [12:10<05:40, 1.33it/s]
71%|βββββββ | 1110/1563 [12:11<04:53, 1.54it/s]
71%|βββββββ | 1111/1563 [12:11<05:16, 1.43it/s]
71%|βββββββ | 1112/1563 [12:12<04:45, 1.58it/s]
71%|βββββββ | 1113/1563 [12:13<05:12, 1.44it/s]
71%|ββββββββ | 1114/1563 [12:14<05:22, 1.39it/s]
71%|ββββββββ | 1115/1563 [12:14<05:36, 1.33it/s]
71%|ββββββββ | 1116/1563 [12:15<05:01, 1.48it/s]
71%|ββββββββ | 1117/1563 [12:16<05:16, 1.41it/s]
72%|ββββββββ | 1118/1563 [12:16<04:58, 1.49it/s]
72%|ββββββββ | 1119/1563 [12:17<05:19, 1.39it/s]
72%|ββββββββ | 1120/1563 [12:18<05:27, 1.35it/s]
72%|ββββββββ | 1121/1563 [12:19<05:20, 1.38it/s]
72%|ββββββββ | 1122/1563 [12:19<04:50, 1.52it/s]
72%|ββββββββ | 1123/1563 [12:20<05:14, 1.40it/s]
72%|ββββββββ | 1124/1563 [12:21<05:25, 1.35it/s]
72%|ββββββββ | 1125/1563 [12:21<05:27, 1.34it/s]
72%|ββββββββ | 1126/1563 [12:22<04:53, 1.49it/s]
72%|ββββββββ | 1127/1563 [12:23<04:40, 1.56it/s]
72%|ββββββββ | 1128/1563 [12:23<04:53, 1.48it/s]
72%|ββββββββ | 1129/1563 [12:24<04:26, 1.63it/s]
72%|ββββββββ | 1130/1563 [12:25<04:54, 1.47it/s]
72%|ββββββββ | 1131/1563 [12:25<05:15, 1.37it/s]
72%|ββββββββ | 1132/1563 [12:26<04:35, 1.57it/s]
72%|ββββββββ | 1133/1563 [12:27<04:38, 1.55it/s]
73%|ββββββββ | 1134/1563 [12:27<04:20, 1.65it/s]
73%|ββββββββ | 1135/1563 [12:28<04:41, 1.52it/s]
73%|ββββββββ | 1136/1563 [12:29<04:55, 1.45it/s]
73%|ββββββββ | 1137/1563 [12:29<05:16, 1.35it/s]
73%|ββββββββ | 1138/1563 [12:30<04:39, 1.52it/s]
73%|ββββββββ | 1139/1563 [12:31<04:37, 1.53it/s]
73%|ββββββββ | 1140/1563 [12:31<04:26, 1.59it/s]
73%|ββββββββ | 1141/1563 [12:32<04:41, 1.50it/s]
73%|ββββββββ | 1142/1563 [12:33<05:01, 1.40it/s]
73%|ββββββββ | 1143/1563 [12:33<05:10, 1.35it/s]
73%|ββββββββ | 1144/1563 [12:34<05:23, 1.30it/s]
73%|ββββββββ | 1145/1563 [12:35<05:30, 1.26it/s]
73%|ββββββββ | 1146/1563 [12:36<05:33, 1.25it/s]
73%|ββββββββ | 1147/1563 [12:37<05:38, 1.23it/s]
73%|ββββββββ | 1148/1563 [12:38<05:31, 1.25it/s]
74%|ββββββββ | 1149/1563 [12:38<04:52, 1.41it/s]
74%|ββββββββ | 1150/1563 [12:39<05:05, 1.35it/s]
{'loss': 0.1443, 'grad_norm': 3.234375, 'learning_rate': 5.297504798464492e-06, 'epoch': 0.74}
74%|ββββββββ | 1150/1563 [12:39<05:05, 1.35it/s]
74%|ββββββββ | 1151/1563 [12:40<05:08, 1.34it/s]
74%|ββββββββ | 1152/1563 [12:41<05:18, 1.29it/s]
74%|ββββββββ | 1153/1563 [12:41<05:09, 1.32it/s]
74%|ββββββββ | 1154/1563 [12:42<04:29, 1.52it/s]
74%|ββββββββ | 1155/1563 [12:42<04:07, 1.65it/s]
74%|ββββββββ | 1156/1563 [12:43<04:06, 1.65it/s]
74%|ββββββββ | 1157/1563 [12:43<03:56, 1.72it/s]
74%|ββββββββ | 1158/1563 [12:44<03:40, 1.84it/s]
74%|ββββββββ | 1159/1563 [12:44<03:53, 1.73it/s]
74%|ββββββββ | 1160/1563 [12:45<04:07, 1.63it/s]
74%|ββββββββ | 1161/1563 [12:46<04:17, 1.56it/s]
74%|ββββββββ | 1162/1563 [12:47<04:32, 1.47it/s]
74%|ββββββββ | 1163/1563 [12:47<04:15, 1.56it/s]
74%|ββββββββ | 1164/1563 [12:48<04:09, 1.60it/s]
75%|ββββββββ | 1165/1563 [12:48<03:56, 1.68it/s]
75%|ββββββββ | 1166/1563 [12:49<04:27, 1.49it/s]
75%|ββββββββ | 1167/1563 [12:50<04:48, 1.37it/s]
75%|ββββββββ | 1168/1563 [12:50<04:15, 1.54it/s]
75%|ββββββββ | 1169/1563 [12:51<04:28, 1.46it/s]
75%|ββββββββ | 1170/1563 [12:52<04:24, 1.49it/s]
75%|ββββββββ | 1171/1563 [12:53<04:37, 1.41it/s]
75%|ββββββββ | 1172/1563 [12:53<04:11, 1.55it/s]
75%|ββββββββ | 1173/1563 [12:54<04:31, 1.44it/s]
75%|ββββββββ | 1174/1563 [12:55<04:50, 1.34it/s]
75%|ββββββββ | 1175/1563 [12:55<04:41, 1.38it/s]
75%|ββββββββ | 1176/1563 [12:56<04:54, 1.31it/s]
75%|ββββββββ | 1177/1563 [12:57<04:43, 1.36it/s]
75%|ββββββββ | 1178/1563 [12:57<04:11, 1.53it/s]
75%|ββββββββ | 1179/1563 [12:58<04:27, 1.44it/s]
75%|ββββββββ | 1180/1563 [12:59<04:36, 1.38it/s]
76%|ββββββββ | 1181/1563 [13:00<04:09, 1.53it/s]
76%|ββββββββ | 1182/1563 [13:00<03:47, 1.67it/s]
76%|ββββββββ | 1183/1563 [13:01<04:03, 1.56it/s]
76%|ββββββββ | 1184/1563 [13:01<03:44, 1.69it/s]
76%|ββββββββ | 1185/1563 [13:02<03:31, 1.79it/s]
76%|ββββββββ | 1186/1563 [13:02<03:17, 1.91it/s]
76%|ββββββββ | 1187/1563 [13:03<03:45, 1.67it/s]
76%|ββββββββ | 1188/1563 [13:04<04:13, 1.48it/s]
76%|ββββββββ | 1189/1563 [13:05<04:35, 1.36it/s]
76%|ββββββββ | 1190/1563 [13:05<04:28, 1.39it/s]
76%|ββββββββ | 1191/1563 [13:06<03:59, 1.56it/s]
76%|ββββββββ | 1192/1563 [13:06<04:03, 1.53it/s]
76%|ββββββββ | 1193/1563 [13:07<04:25, 1.39it/s]
76%|ββββββββ | 1194/1563 [13:08<04:10, 1.47it/s]
76%|ββββββββ | 1195/1563 [13:09<04:30, 1.36it/s]
77%|ββββββββ | 1196/1563 [13:10<04:43, 1.29it/s]
77%|ββββββββ | 1197/1563 [13:10<04:10, 1.46it/s]
77%|ββββββββ | 1198/1563 [13:11<03:44, 1.63it/s]
77%|ββββββββ | 1199/1563 [13:11<03:30, 1.73it/s]
77%|ββββββββ | 1200/1563 [13:12<03:26, 1.75it/s]
{'loss': 0.1396, 'grad_norm': 1.1015625, 'learning_rate': 4.657709532949457e-06, 'epoch': 0.77}
77%|ββββββββ | 1200/1563 [13:12<03:26, 1.75it/s]
77%|ββββββββ | 1201/1563 [13:12<03:54, 1.54it/s]
77%|ββββββββ | 1202/1563 [13:13<04:04, 1.48it/s]
77%|ββββββββ | 1203/1563 [13:14<03:56, 1.52it/s]
77%|ββββββββ | 1204/1563 [13:14<03:30, 1.70it/s]
77%|ββββββββ | 1205/1563 [13:15<03:59, 1.50it/s]
77%|ββββββββ | 1206/1563 [13:16<03:32, 1.68it/s]
77%|ββββββββ | 1207/1563 [13:16<03:41, 1.60it/s]
77%|ββββββββ | 1208/1563 [13:17<03:47, 1.56it/s]
77%|ββββββββ | 1209/1563 [13:18<03:51, 1.53it/s]
77%|ββββββββ | 1210/1563 [13:18<03:36, 1.63it/s]
77%|ββββββββ | 1211/1563 [13:19<04:00, 1.46it/s]
78%|ββββββββ | 1212/1563 [13:19<03:37, 1.61it/s]
78%|ββββββββ | 1213/1563 [13:20<04:02, 1.45it/s]
78%|ββββββββ | 1214/1563 [13:21<03:40, 1.58it/s]
78%|ββββββββ | 1215/1563 [13:21<03:50, 1.51it/s]
78%|ββββββββ | 1216/1563 [13:22<04:08, 1.40it/s]
78%|ββββββββ | 1217/1563 [13:23<03:40, 1.57it/s]
78%|ββββββββ | 1218/1563 [13:23<03:45, 1.53it/s]
78%|ββββββββ | 1219/1563 [13:24<04:03, 1.41it/s]
78%|ββββββββ | 1220/1563 [13:25<03:46, 1.51it/s]
78%|ββββββββ | 1221/1563 [13:26<03:51, 1.48it/s]
78%|ββββββββ | 1222/1563 [13:26<03:31, 1.61it/s]
78%|ββββββββ | 1223/1563 [13:27<03:51, 1.47it/s]
78%|ββββββββ | 1224/1563 [13:28<04:00, 1.41it/s]
78%|ββββββββ | 1225/1563 [13:28<03:48, 1.48it/s]
78%|ββββββββ | 1226/1563 [13:29<03:28, 1.62it/s]
79%|ββββββββ | 1227/1563 [13:29<03:38, 1.53it/s]
79%|ββββββββ | 1228/1563 [13:30<03:14, 1.73it/s]
79%|ββββββββ | 1229/1563 [13:31<03:28, 1.60it/s]
79%|ββββββββ | 1230/1563 [13:31<03:25, 1.62it/s]
79%|ββββββββ | 1231/1563 [13:32<03:48, 1.45it/s]
79%|ββββββββ | 1232/1563 [13:33<03:54, 1.41it/s]
79%|ββββββββ | 1233/1563 [13:33<03:35, 1.53it/s]
79%|ββββββββ | 1234/1563 [13:34<03:17, 1.67it/s]
79%|ββββββββ | 1235/1563 [13:34<02:56, 1.86it/s]
79%|ββββββββ | 1236/1563 [13:35<03:18, 1.65it/s]
79%|ββββββββ | 1237/1563 [13:35<03:01, 1.80it/s]
79%|ββββββββ | 1238/1563 [13:36<03:26, 1.57it/s]
79%|ββββββββ | 1239/1563 [13:37<03:32, 1.52it/s]
79%|ββββββββ | 1240/1563 [13:37<03:14, 1.66it/s]
79%|ββββββββ | 1241/1563 [13:38<03:37, 1.48it/s]
79%|ββββββββ | 1242/1563 [13:39<03:12, 1.67it/s]
80%|ββββββββ | 1243/1563 [13:39<03:19, 1.61it/s]
80%|ββββββββ | 1244/1563 [13:40<03:25, 1.55it/s]
80%|ββββββββ | 1245/1563 [13:41<03:17, 1.61it/s]
80%|ββββββββ | 1246/1563 [13:41<02:59, 1.77it/s]
80%|ββββββββ | 1247/1563 [13:42<03:18, 1.59it/s]
80%|ββββββββ | 1248/1563 [13:43<03:31, 1.49it/s]
80%|ββββββββ | 1249/1563 [13:43<03:37, 1.45it/s]
80%|ββββββββ | 1250/1563 [13:44<03:39, 1.43it/s]
{'loss': 0.1414, 'grad_norm': 0.8828125, 'learning_rate': 4.0179142674344215e-06, 'epoch': 0.8}
80%|ββββββββ | 1250/1563 [13:44<03:39, 1.43it/s]
80%|ββββββββ | 1251/1563 [13:45<03:53, 1.34it/s]
80%|ββββββββ | 1252/1563 [13:46<04:01, 1.29it/s]
80%|ββββββββ | 1253/1563 [13:46<03:43, 1.38it/s]
80%|ββββββββ | 1254/1563 [13:47<03:20, 1.54it/s]
80%|ββββββββ | 1255/1563 [13:47<03:09, 1.62it/s]
80%|ββββββββ | 1256/1563 [13:48<03:22, 1.51it/s]
80%|ββββββββ | 1257/1563 [13:49<03:17, 1.55it/s]
80%|ββββββββ | 1258/1563 [13:49<03:18, 1.53it/s]
81%|ββββββββ | 1259/1563 [13:50<03:05, 1.64it/s]
81%|ββββββββ | 1260/1563 [13:51<03:22, 1.49it/s]
81%|ββββββββ | 1261/1563 [13:51<03:22, 1.49it/s]
81%|ββββββββ | 1262/1563 [13:52<03:11, 1.58it/s]
81%|ββββββββ | 1263/1563 [13:52<02:54, 1.72it/s]
81%|ββββββββ | 1264/1563 [13:53<03:00, 1.65it/s]
81%|ββββββββ | 1265/1563 [13:54<03:24, 1.46it/s]
81%|ββββββββ | 1266/1563 [13:55<03:25, 1.44it/s]
81%|ββββββββ | 1267/1563 [13:55<03:22, 1.46it/s]
81%|ββββββββ | 1268/1563 [13:56<03:20, 1.47it/s]
81%|ββββββββ | 1269/1563 [13:57<03:05, 1.58it/s]
81%|βββββββββ | 1270/1563 [13:57<03:22, 1.45it/s]
81%|βββββββββ | 1271/1563 [13:58<03:20, 1.46it/s]
81%|βββββββββ | 1272/1563 [13:59<03:19, 1.46it/s]
81%|βββββββββ | 1273/1563 [13:59<03:15, 1.48it/s]
82%|βββββββββ | 1274/1563 [14:00<03:31, 1.37it/s]
82%|βββββββββ | 1275/1563 [14:01<03:25, 1.40it/s]
82%|βββββββββ | 1276/1563 [14:02<03:38, 1.31it/s]
82%|βββββββββ | 1277/1563 [14:03<03:45, 1.27it/s]
82%|βββββββββ | 1278/1563 [14:03<03:16, 1.45it/s]
82%|βββββββββ | 1279/1563 [14:04<02:56, 1.61it/s]
82%|βββββββββ | 1280/1563 [14:04<03:15, 1.45it/s]
82%|βββββββββ | 1281/1563 [14:05<02:53, 1.63it/s]
82%|βββββββββ | 1282/1563 [14:06<03:11, 1.46it/s]
82%|βββββββββ | 1283/1563 [14:06<02:57, 1.58it/s]
82%|βββββββββ | 1284/1563 [14:07<03:11, 1.46it/s]
82%|βββββββββ | 1285/1563 [14:07<02:55, 1.58it/s]
82%|βββββββββ | 1286/1563 [14:08<02:39, 1.74it/s]
82%|βββββββββ | 1287/1563 [14:09<02:54, 1.58it/s]
82%|βββββββββ | 1288/1563 [14:09<02:52, 1.59it/s]
82%|βββββββββ | 1289/1563 [14:10<02:36, 1.75it/s]
83%|βββββββββ | 1290/1563 [14:10<02:27, 1.85it/s]
83%|βββββββββ | 1291/1563 [14:11<02:44, 1.65it/s]
83%|βββββββββ | 1292/1563 [14:12<02:51, 1.58it/s]
83%|βββββββββ | 1293/1563 [14:12<02:33, 1.76it/s]
83%|βββββββββ | 1294/1563 [14:13<02:54, 1.54it/s]
83%|βββββββββ | 1295/1563 [14:14<03:10, 1.41it/s]
83%|βββββββββ | 1296/1563 [14:15<03:22, 1.32it/s]
83%|βββββββββ | 1297/1563 [14:15<03:04, 1.44it/s]
83%|βββββββββ | 1298/1563 [14:16<03:06, 1.42it/s]
83%|βββββββββ | 1299/1563 [14:16<02:48, 1.56it/s]
83%|βββββββββ | 1300/1563 [14:17<02:33, 1.71it/s]
{'loss': 0.1409, 'grad_norm': 8.8125, 'learning_rate': 3.378119001919386e-06, 'epoch': 0.83}
83%|βββββββββ | 1300/1563 [14:17<02:33, 1.71it/s]
83%|βββββββββ | 1301/1563 [14:18<02:44, 1.60it/s]
83%|βββββββββ | 1302/1563 [14:18<02:26, 1.78it/s]
83%|βββββββββ | 1303/1563 [14:18<02:18, 1.88it/s]
83%|βββββββββ | 1304/1563 [14:19<02:38, 1.64it/s]
83%|βββββββββ | 1305/1563 [14:20<02:34, 1.67it/s]
84%|βββββββββ | 1306/1563 [14:20<02:23, 1.80it/s]
84%|βββββββββ | 1307/1563 [14:21<02:40, 1.60it/s]
84%|βββββββββ | 1308/1563 [14:22<02:54, 1.46it/s]
84%|βββββββββ | 1309/1563 [14:22<02:37, 1.61it/s]
84%|βββββββββ | 1310/1563 [14:23<02:43, 1.55it/s]
84%|βββββββββ | 1311/1563 [14:24<02:33, 1.64it/s]
84%|βββββββββ | 1312/1563 [14:24<02:24, 1.73it/s]
84%|βββββββββ | 1313/1563 [14:25<02:31, 1.65it/s]
84%|βββββββββ | 1314/1563 [14:25<02:20, 1.77it/s]
84%|βββββββββ | 1315/1563 [14:26<02:19, 1.78it/s]
84%|βββββββββ | 1316/1563 [14:27<02:38, 1.55it/s]
84%|βββββββββ | 1317/1563 [14:27<02:30, 1.64it/s]
84%|βββββββββ | 1318/1563 [14:28<02:19, 1.76it/s]
84%|βββββββββ | 1319/1563 [14:28<02:09, 1.88it/s]
84%|βββββββββ | 1320/1563 [14:29<02:28, 1.63it/s]
85%|βββββββββ | 1321/1563 [14:29<02:17, 1.76it/s]
85%|βββββββββ | 1322/1563 [14:30<02:22, 1.70it/s]
85%|βββββββββ | 1323/1563 [14:31<02:18, 1.73it/s]
85%|βββββββββ | 1324/1563 [14:31<02:22, 1.68it/s]
85%|βββββββββ | 1325/1563 [14:32<02:21, 1.68it/s]
85%|βββββββββ | 1326/1563 [14:33<02:39, 1.49it/s]
85%|βββββββββ | 1327/1563 [14:33<02:44, 1.44it/s]
85%|βββββββββ | 1328/1563 [14:34<02:25, 1.62it/s]
85%|βββββββββ | 1329/1563 [14:34<02:24, 1.62it/s]
85%|βββββββββ | 1330/1563 [14:35<02:16, 1.70it/s]
85%|βββββββββ | 1331/1563 [14:36<02:33, 1.51it/s]
85%|βββββββββ | 1332/1563 [14:36<02:35, 1.48it/s]
85%|βββββββββ | 1333/1563 [14:37<02:17, 1.67it/s]
85%|βββββββββ | 1334/1563 [14:37<02:14, 1.71it/s]
85%|βββββββββ | 1335/1563 [14:38<02:02, 1.85it/s]
85%|βββββββββ | 1336/1563 [14:39<02:22, 1.59it/s]
86%|βββββββββ | 1337/1563 [14:40<02:34, 1.46it/s]
86%|βββββββββ | 1338/1563 [14:40<02:45, 1.36it/s]
86%|βββββββββ | 1339/1563 [14:41<02:49, 1.32it/s]
86%|βββββββββ | 1340/1563 [14:42<02:55, 1.27it/s]
86%|βββββββββ | 1341/1563 [14:43<02:47, 1.32it/s]
86%|βββββββββ | 1342/1563 [14:43<02:44, 1.34it/s]
86%|βββββββββ | 1343/1563 [14:44<02:39, 1.38it/s]
86%|βββββββββ | 1344/1563 [14:45<02:46, 1.32it/s]
86%|βββββββββ | 1345/1563 [14:46<02:51, 1.27it/s]
86%|βββββββββ | 1346/1563 [14:46<02:36, 1.39it/s]
86%|βββββββββ | 1347/1563 [14:47<02:44, 1.31it/s]
86%|βββββββββ | 1348/1563 [14:48<02:29, 1.44it/s]
86%|βββββββββ | 1349/1563 [14:48<02:19, 1.53it/s]
86%|βββββββββ | 1350/1563 [14:49<02:11, 1.62it/s]
{'loss': 0.1395, 'grad_norm': 12.1875, 'learning_rate': 2.738323736404351e-06, 'epoch': 0.86}
86%|βββββββββ | 1350/1563 [14:49<02:11, 1.62it/s]
86%|βββββββββ | 1351/1563 [14:49<02:09, 1.63it/s]
87%|βββββββββ | 1352/1563 [14:50<02:25, 1.45it/s]
87%|βββββββββ | 1353/1563 [14:51<02:32, 1.37it/s]
87%|βββββββββ | 1354/1563 [14:52<02:19, 1.50it/s]
87%|βββββββββ | 1355/1563 [14:52<02:24, 1.44it/s]
87%|βββββββββ | 1356/1563 [14:53<02:34, 1.34it/s]
87%|βββββββββ | 1357/1563 [14:54<02:13, 1.54it/s]
87%|βββββββββ | 1358/1563 [14:54<02:18, 1.49it/s]
87%|βββββββββ | 1359/1563 [14:55<02:28, 1.37it/s]
87%|βββββββββ | 1360/1563 [14:56<02:25, 1.40it/s]
87%|βββββββββ | 1361/1563 [14:57<02:27, 1.37it/s]
87%|βββββββββ | 1362/1563 [14:57<02:25, 1.38it/s]
87%|βββββββββ | 1363/1563 [14:58<02:31, 1.32it/s]
87%|βββββββββ | 1364/1563 [14:59<02:32, 1.30it/s]
87%|βββββββββ | 1365/1563 [15:00<02:19, 1.42it/s]
87%|βββββββββ | 1366/1563 [15:01<02:26, 1.34it/s]
87%|βββββββββ | 1367/1563 [15:01<02:14, 1.46it/s]
88%|βββββββββ | 1368/1563 [15:02<02:23, 1.36it/s]
88%|βββββββββ | 1369/1563 [15:03<02:30, 1.29it/s]
88%|βββββββββ | 1370/1563 [15:03<02:20, 1.37it/s]
88%|βββββββββ | 1371/1563 [15:04<02:04, 1.54it/s]
88%|βββββββββ | 1372/1563 [15:05<02:06, 1.51it/s]
88%|βββββββββ | 1373/1563 [15:05<02:14, 1.42it/s]
88%|βββββββββ | 1374/1563 [15:06<02:00, 1.57it/s]
88%|βββββββββ | 1375/1563 [15:06<01:49, 1.72it/s]
88%|βββββββββ | 1376/1563 [15:07<01:47, 1.75it/s]
88%|βββββββββ | 1377/1563 [15:07<01:48, 1.72it/s]
88%|βββββββββ | 1378/1563 [15:08<01:38, 1.88it/s]
88%|βββββββββ | 1379/1563 [15:08<01:38, 1.87it/s]
88%|βββββββββ | 1380/1563 [15:09<01:34, 1.94it/s]
88%|βββββββββ | 1381/1563 [15:09<01:29, 2.04it/s]
88%|βββββββββ | 1382/1563 [15:10<01:27, 2.07it/s]
88%|βββββββββ | 1383/1563 [15:10<01:32, 1.95it/s]
89%|βββββββββ | 1384/1563 [15:11<01:28, 2.03it/s]
89%|βββββββββ | 1385/1563 [15:11<01:29, 1.99it/s]
89%|βββββββββ | 1386/1563 [15:12<01:44, 1.70it/s]
89%|βββββββββ | 1387/1563 [15:13<01:36, 1.83it/s]
89%|βββββββββ | 1388/1563 [15:13<01:50, 1.58it/s]
89%|βββββββββ | 1389/1563 [15:14<01:42, 1.70it/s]
89%|βββββββββ | 1390/1563 [15:14<01:38, 1.75it/s]
89%|βββββββββ | 1391/1563 [15:15<01:34, 1.82it/s]
89%|βββββββββ | 1392/1563 [15:15<01:31, 1.86it/s]
89%|βββββββββ | 1393/1563 [15:16<01:48, 1.57it/s]
89%|βββββββββ | 1394/1563 [15:17<01:51, 1.52it/s]
89%|βββββββββ | 1395/1563 [15:18<01:53, 1.48it/s]
89%|βββββββββ | 1396/1563 [15:19<02:01, 1.37it/s]
89%|βββββββββ | 1397/1563 [15:19<02:02, 1.35it/s]
89%|βββββββββ | 1398/1563 [15:20<02:08, 1.28it/s]
90%|βββββββββ | 1399/1563 [15:21<01:52, 1.46it/s]
90%|βββββββββ | 1400/1563 [15:21<01:40, 1.62it/s]
{'loss': 0.1385, 'grad_norm': 0.73828125, 'learning_rate': 2.0985284708893156e-06, 'epoch': 0.9}
90%|βββββββββ | 1400/1563 [15:21<01:40, 1.62it/s]
90%|βββββββββ | 1401/1563 [15:22<01:33, 1.73it/s]
90%|βββββββββ | 1402/1563 [15:22<01:26, 1.86it/s]
90%|βββββββββ | 1403/1563 [15:23<01:22, 1.94it/s]
90%|βββββββββ | 1404/1563 [15:23<01:26, 1.85it/s]
90%|βββββββββ | 1405/1563 [15:24<01:40, 1.58it/s]
90%|βββββββββ | 1406/1563 [15:25<01:49, 1.43it/s]
90%|βββββββββ | 1407/1563 [15:26<01:48, 1.44it/s]
90%|βββββββββ | 1408/1563 [15:26<01:36, 1.60it/s]
90%|βββββββββ | 1409/1563 [15:26<01:30, 1.70it/s]
90%|βββββββββ | 1410/1563 [15:27<01:42, 1.49it/s]
90%|βββββββββ | 1411/1563 [15:28<01:42, 1.49it/s]
90%|βββββββββ | 1412/1563 [15:28<01:30, 1.68it/s]
90%|βββββββββ | 1413/1563 [15:29<01:22, 1.82it/s]
90%|βββββββββ | 1414/1563 [15:29<01:18, 1.91it/s]
91%|βββββββββ | 1415/1563 [15:30<01:25, 1.72it/s]
91%|βββββββββ | 1416/1563 [15:31<01:37, 1.51it/s]
91%|βββββββββ | 1417/1563 [15:32<01:36, 1.52it/s]
91%|βββββββββ | 1418/1563 [15:32<01:35, 1.52it/s]
91%|βββββββββ | 1419/1563 [15:33<01:41, 1.42it/s]
91%|βββββββββ | 1420/1563 [15:34<01:40, 1.42it/s]
91%|βββββββββ | 1421/1563 [15:34<01:35, 1.49it/s]
91%|βββββββββ | 1422/1563 [15:35<01:33, 1.51it/s]
91%|βββββββββ | 1423/1563 [15:36<01:35, 1.47it/s]
91%|βββββββββ | 1424/1563 [15:36<01:30, 1.53it/s]
91%|βββββββββ | 1425/1563 [15:37<01:29, 1.55it/s]
91%|βββββββββ | 1426/1563 [15:37<01:23, 1.63it/s]
91%|ββββββββββ| 1427/1563 [15:38<01:32, 1.47it/s]
91%|ββββββββββ| 1428/1563 [15:39<01:30, 1.48it/s]
91%|ββββββββββ| 1429/1563 [15:40<01:35, 1.40it/s]
91%|ββββββββββ| 1430/1563 [15:40<01:31, 1.45it/s]
92%|ββββββββββ| 1431/1563 [15:41<01:28, 1.49it/s]
92%|ββββββββββ| 1432/1563 [15:42<01:22, 1.58it/s]
92%|ββββββββββ| 1433/1563 [15:42<01:26, 1.50it/s]
92%|ββββββββββ| 1434/1563 [15:43<01:33, 1.39it/s]
92%|ββββββββββ| 1435/1563 [15:44<01:38, 1.30it/s]
92%|ββββββββββ| 1436/1563 [15:45<01:39, 1.27it/s]
92%|ββββββββββ| 1437/1563 [15:46<01:42, 1.23it/s]
92%|ββββββββββ| 1438/1563 [15:47<01:41, 1.23it/s]
92%|ββββββββββ| 1439/1563 [15:47<01:35, 1.30it/s]
92%|ββββββββββ| 1440/1563 [15:48<01:35, 1.29it/s]
92%|ββββββββββ| 1441/1563 [15:49<01:37, 1.25it/s]
92%|ββββββββββ| 1442/1563 [15:49<01:25, 1.41it/s]
92%|ββββββββββ| 1443/1563 [15:50<01:30, 1.33it/s]
92%|ββββββββββ| 1444/1563 [15:51<01:21, 1.46it/s]
92%|ββββββββββ| 1445/1563 [15:51<01:16, 1.55it/s]
93%|ββββββββββ| 1446/1563 [15:52<01:14, 1.56it/s]
93%|ββββββββββ| 1447/1563 [15:52<01:12, 1.59it/s]
93%|ββββββββββ| 1448/1563 [15:53<01:12, 1.59it/s]
93%|ββββββββββ| 1449/1563 [15:54<01:06, 1.72it/s]
93%|ββββββββββ| 1450/1563 [15:54<01:02, 1.80it/s]
{'loss': 0.1388, 'grad_norm': 14.0625, 'learning_rate': 1.4587332053742803e-06, 'epoch': 0.93}
93%|ββββββββββ| 1450/1563 [15:54<01:02, 1.80it/s]
93%|ββββββββββ| 1451/1563 [15:55<01:05, 1.70it/s]
93%|ββββββββββ| 1452/1563 [15:55<01:01, 1.79it/s]
93%|ββββββββββ| 1453/1563 [15:56<01:07, 1.63it/s]
93%|ββββββββββ| 1454/1563 [15:56<01:01, 1.76it/s]
93%|ββββββββββ| 1455/1563 [15:57<01:04, 1.67it/s]
93%|ββββββββββ| 1456/1563 [15:58<01:03, 1.69it/s]
93%|ββββββββββ| 1457/1563 [15:58<01:05, 1.62it/s]
93%|ββββββββββ| 1458/1563 [15:59<00:59, 1.77it/s]
93%|ββββββββββ| 1459/1563 [15:59<00:54, 1.92it/s]
93%|ββββββββββ| 1460/1563 [16:00<00:58, 1.76it/s]
93%|ββββββββββ| 1461/1563 [16:01<01:04, 1.57it/s]
94%|ββββββββββ| 1462/1563 [16:01<01:07, 1.49it/s]
94%|ββββββββββ| 1463/1563 [16:02<01:12, 1.38it/s]
94%|ββββββββββ| 1464/1563 [16:03<01:04, 1.54it/s]
94%|ββββββββββ| 1465/1563 [16:03<01:03, 1.55it/s]
94%|ββββββββββ| 1466/1563 [16:04<01:09, 1.40it/s]
94%|ββββββββββ| 1467/1563 [16:05<01:09, 1.38it/s]
94%|ββββββββββ| 1468/1563 [16:05<01:00, 1.56it/s]
94%|ββββββββββ| 1469/1563 [16:06<00:54, 1.72it/s]
94%|ββββββββββ| 1470/1563 [16:07<01:01, 1.52it/s]
94%|ββββββββββ| 1471/1563 [16:07<01:00, 1.53it/s]
94%|ββββββββββ| 1472/1563 [16:08<00:55, 1.64it/s]
94%|ββββββββββ| 1473/1563 [16:08<00:52, 1.73it/s]
94%|ββββββββββ| 1474/1563 [16:09<00:57, 1.54it/s]
94%|ββββββββββ| 1475/1563 [16:10<00:52, 1.69it/s]
94%|ββββββββββ| 1476/1563 [16:10<00:56, 1.53it/s]
94%|ββββββββββ| 1477/1563 [16:11<00:53, 1.61it/s]
95%|ββββββββββ| 1478/1563 [16:12<00:50, 1.68it/s]
95%|ββββββββββ| 1479/1563 [16:12<00:47, 1.76it/s]
95%|ββββββββββ| 1480/1563 [16:13<00:50, 1.64it/s]
95%|ββββββββββ| 1481/1563 [16:14<00:54, 1.50it/s]
95%|ββββββββββ| 1482/1563 [16:14<00:57, 1.40it/s]
95%|ββββββββββ| 1483/1563 [16:15<00:56, 1.42it/s]
95%|ββββββββββ| 1484/1563 [16:16<00:55, 1.43it/s]
95%|ββββββββββ| 1485/1563 [16:17<00:58, 1.33it/s]
95%|ββββββββββ| 1486/1563 [16:17<00:59, 1.29it/s]
95%|ββββββββββ| 1487/1563 [16:18<00:58, 1.29it/s]
95%|ββββββββββ| 1488/1563 [16:19<00:56, 1.33it/s]
95%|ββββββββββ| 1489/1563 [16:20<00:54, 1.36it/s]
95%|ββββββββββ| 1490/1563 [16:20<00:55, 1.31it/s]
95%|ββββββββββ| 1491/1563 [16:21<00:48, 1.49it/s]
95%|ββββββββββ| 1492/1563 [16:21<00:44, 1.59it/s]
96%|ββββββββββ| 1493/1563 [16:22<00:41, 1.69it/s]
96%|ββββββββββ| 1494/1563 [16:23<00:39, 1.74it/s]
96%|ββββββββββ| 1495/1563 [16:23<00:39, 1.73it/s]
96%|ββββββββββ| 1496/1563 [16:24<00:41, 1.60it/s]
96%|ββββββββββ| 1497/1563 [16:25<00:43, 1.53it/s]
96%|ββββββββββ| 1498/1563 [16:25<00:44, 1.46it/s]
96%|ββββββββββ| 1499/1563 [16:26<00:39, 1.60it/s]
96%|ββββββββββ| 1500/1563 [16:26<00:40, 1.57it/s]
{'loss': 0.1387, 'grad_norm': 1.71875, 'learning_rate': 8.18937939859245e-07, 'epoch': 0.96}
96%|ββββββββββ| 1500/1563 [16:27<00:40, 1.57it/s]
96%|ββββββββββ| 1501/1563 [16:27<00:43, 1.41it/s]
96%|ββββββββββ| 1502/1563 [16:28<00:39, 1.54it/s]
96%|ββββββββββ| 1503/1563 [16:29<00:40, 1.49it/s]
96%|ββββββββββ| 1504/1563 [16:29<00:40, 1.44it/s]
96%|ββββββββββ| 1505/1563 [16:30<00:39, 1.47it/s]
96%|ββββββββββ| 1506/1563 [16:31<00:41, 1.39it/s]
96%|ββββββββββ| 1507/1563 [16:31<00:36, 1.54it/s]
96%|ββββββββββ| 1508/1563 [16:32<00:39, 1.40it/s]
97%|ββββββββββ| 1509/1563 [16:33<00:40, 1.33it/s]
97%|ββββββββββ| 1510/1563 [16:33<00:35, 1.50it/s]
97%|ββββββββββ| 1511/1563 [16:34<00:36, 1.41it/s]
97%|ββββββββββ| 1512/1563 [16:35<00:33, 1.53it/s]
97%|ββββββββββ| 1513/1563 [16:35<00:29, 1.68it/s]
97%|ββββββββββ| 1514/1563 [16:36<00:27, 1.77it/s]
97%|ββββββββββ| 1515/1563 [16:36<00:28, 1.69it/s]
97%|ββββββββββ| 1516/1563 [16:37<00:26, 1.81it/s]
97%|ββββββββββ| 1517/1563 [16:38<00:29, 1.56it/s]
97%|ββββββββββ| 1518/1563 [16:39<00:31, 1.43it/s]
97%|ββββββββββ| 1519/1563 [16:39<00:27, 1.61it/s]
97%|ββββββββββ| 1520/1563 [16:40<00:29, 1.44it/s]
97%|ββββββββββ| 1521/1563 [16:40<00:26, 1.61it/s]
97%|ββββββββββ| 1522/1563 [16:41<00:28, 1.44it/s]
97%|ββββββββββ| 1523/1563 [16:42<00:28, 1.40it/s]
98%|ββββββββββ| 1524/1563 [16:43<00:29, 1.33it/s]
98%|ββββββββββ| 1525/1563 [16:43<00:24, 1.52it/s]
98%|ββββββββββ| 1526/1563 [16:44<00:21, 1.70it/s]
98%|ββββββββββ| 1527/1563 [16:44<00:21, 1.71it/s]
98%|ββββββββββ| 1528/1563 [16:45<00:22, 1.56it/s]
98%|ββββββββββ| 1529/1563 [16:45<00:20, 1.68it/s]
98%|ββββββββββ| 1530/1563 [16:46<00:19, 1.67it/s]
98%|ββββββββββ| 1531/1563 [16:47<00:18, 1.77it/s]
98%|ββββββββββ| 1532/1563 [16:47<00:18, 1.68it/s]
98%|ββββββββββ| 1533/1563 [16:48<00:18, 1.58it/s]
98%|ββββββββββ| 1534/1563 [16:49<00:20, 1.44it/s]
98%|ββββββββββ| 1535/1563 [16:49<00:17, 1.59it/s]
98%|ββββββββββ| 1536/1563 [16:50<00:18, 1.48it/s]
98%|ββββββββββ| 1537/1563 [16:51<00:20, 1.30it/s]
98%|ββββββββββ| 1538/1563 [16:52<00:19, 1.31it/s]
98%|ββββββββββ| 1539/1563 [16:52<00:17, 1.40it/s]
99%|ββββββββββ| 1540/1563 [16:53<00:15, 1.44it/s]
99%|ββββββββββ| 1541/1563 [16:54<00:14, 1.49it/s]
99%|ββββββββββ| 1542/1563 [16:54<00:12, 1.65it/s]
99%|ββββββββββ| 1543/1563 [16:55<00:11, 1.75it/s]
99%|ββββββββββ| 1544/1563 [16:55<00:10, 1.88it/s]
99%|ββββββββββ| 1545/1563 [16:56<00:10, 1.72it/s]
99%|ββββββββββ| 1546/1563 [16:56<00:10, 1.57it/s]
99%|ββββββββββ| 1547/1563 [16:57<00:10, 1.55it/s]
99%|ββββββββββ| 1548/1563 [16:58<00:08, 1.67it/s]
99%|ββββββββββ| 1549/1563 [16:58<00:07, 1.76it/s]
99%|ββββββββββ| 1550/1563 [16:59<00:08, 1.53it/s]
{'loss': 0.1407, 'grad_norm': 2.4375, 'learning_rate': 1.7914267434420988e-07, 'epoch': 0.99}
99%|ββββββββββ| 1550/1563 [16:59<00:08, 1.53it/s]
99%|ββββββββββ| 1551/1563 [16:59<00:07, 1.69it/s]
99%|ββββββββββ| 1552/1563 [17:00<00:07, 1.49it/s]
99%|ββββββββββ| 1553/1563 [17:01<00:06, 1.48it/s]
99%|ββββββββββ| 1554/1563 [17:02<00:06, 1.37it/s]
99%|ββββββββββ| 1555/1563 [17:02<00:05, 1.45it/s]
100%|ββββββββββ| 1556/1563 [17:03<00:04, 1.53it/s]
100%|ββββββββββ| 1557/1563 [17:03<00:03, 1.66it/s]
100%|ββββββββββ| 1558/1563 [17:04<00:03, 1.47it/s]
100%|ββββββββββ| 1559/1563 [17:05<00:02, 1.47it/s]
100%|ββββββββββ| 1560/1563 [17:06<00:02, 1.37it/s]
100%|ββββββββββ| 1561/1563 [17:07<00:01, 1.36it/s]
100%|ββββββββββ| 1562/1563 [17:07<00:00, 1.50it/s]
100%|ββββββββββ| 1563/1563 [17:08<00:00, 1.51it/s]
{'train_runtime': 1033.8148, 'train_samples_per_second': 193.458, 'train_steps_per_second': 1.512, 'train_loss': 0.2836286289449388, 'epoch': 1.0}
100%|ββββββββββ| 1563/1563 [17:12<00:00, 1.51it/s]
100%|ββββββββββ| 1563/1563 [17:12<00:00, 1.51it/s]
model.safetensors: 0%| | 0.00/2.00G [00:00<?, ?B/s]
tokenizer.model: 0%| | 0.00/4.69M [00:00<?, ?B/s][A
training_args.bin: 0%| | 0.00/5.43k [00:00<?, ?B/s][A[A
Upload 3 LFS files: 0%| | 0/3 [00:00<?, ?it/s][A[A[A
tokenizer.model: 0%| | 16.4k/4.69M [00:00<00:31, 147kB/s][A
training_args.bin: 100%|ββββββββββ| 5.43k/5.43k [00:00<00:00, 41.9kB/s]
model.safetensors: 0%| | 2.59M/2.00G [00:00<01:57, 17.0MB/s]
tokenizer.model: 100%|ββββββββββ| 4.69M/4.69M [00:00<00:00, 12.1MB/s]
model.safetensors: 1%| | 16.0M/2.00G [00:00<01:07, 29.5MB/s]
model.safetensors: 2%|β | 32.0M/2.00G [00:00<00:45, 43.2MB/s]
model.safetensors: 2%|β | 48.0M/2.00G [00:01<00:38, 51.1MB/s]
model.safetensors: 3%|β | 64.0M/2.00G [00:01<00:33, 58.4MB/s]
model.safetensors: 4%|β | 80.0M/2.00G [00:01<00:32, 59.3MB/s]
model.safetensors: 5%|β | 96.0M/2.00G [00:01<00:37, 51.0MB/s]
model.safetensors: 6%|β | 112M/2.00G [00:02<00:34, 54.8MB/s]
model.safetensors: 6%|β | 128M/2.00G [00:02<00:32, 58.1MB/s]
model.safetensors: 7%|β | 144M/2.00G [00:02<00:31, 59.2MB/s]
model.safetensors: 8%|β | 160M/2.00G [00:02<00:30, 61.0MB/s]
model.safetensors: 9%|β | 176M/2.00G [00:03<00:30, 60.5MB/s]
model.safetensors: 10%|β | 192M/2.00G [00:03<00:30, 59.4MB/s]
model.safetensors: 10%|β | 208M/2.00G [00:03<00:28, 62.6MB/s]
model.safetensors: 11%|β | 224M/2.00G [00:03<00:28, 61.8MB/s]
model.safetensors: 12%|ββ | 240M/2.00G [00:04<00:27, 63.4MB/s]
model.safetensors: 13%|ββ | 256M/2.00G [00:04<00:27, 63.9MB/s]
model.safetensors: 14%|ββ | 272M/2.00G [00:04<00:28, 59.9MB/s]
model.safetensors: 14%|ββ | 288M/2.00G [00:05<00:28, 59.3MB/s]
model.safetensors: 15%|ββ | 304M/2.00G [00:05<00:30, 55.6MB/s]
model.safetensors: 16%|ββ | 320M/2.00G [00:05<00:30, 55.2MB/s]
model.safetensors: 17%|ββ | 336M/2.00G [00:05<00:28, 58.1MB/s]
model.safetensors: 18%|ββ | 352M/2.00G [00:06<00:27, 59.5MB/s]
model.safetensors: 18%|ββ | 368M/2.00G [00:06<00:26, 61.6MB/s]
model.safetensors: 19%|ββ | 384M/2.00G [00:06<00:26, 61.9MB/s]
model.safetensors: 20%|ββ | 400M/2.00G [00:06<00:26, 60.8MB/s]
model.safetensors: 21%|ββ | 416M/2.00G [00:07<00:26, 59.2MB/s]
model.safetensors: 22%|βββ | 432M/2.00G [00:07<00:25, 61.6MB/s]
model.safetensors: 22%|βββ | 448M/2.00G [00:07<00:24, 62.3MB/s]
model.safetensors: 23%|βββ | 464M/2.00G [00:07<00:22, 67.0MB/s]
model.safetensors: 24%|βββ | 480M/2.00G [00:08<00:23, 63.9MB/s]
model.safetensors: 25%|βββ | 496M/2.00G [00:08<00:23, 63.5MB/s]
model.safetensors: 26%|βββ | 512M/2.00G [00:08<00:30, 48.7MB/s]
model.safetensors: 26%|βββ | 528M/2.00G [00:09<00:28, 51.2MB/s]
model.safetensors: 27%|βββ | 544M/2.00G [00:09<00:27, 52.4MB/s]
model.safetensors: 28%|βββ | 560M/2.00G [00:09<00:25, 55.5MB/s]
model.safetensors: 29%|βββ | 576M/2.00G [00:10<00:25, 56.4MB/s]
model.safetensors: 30%|βββ | 592M/2.00G [00:10<00:24, 57.8MB/s]
model.safetensors: 30%|βββ | 608M/2.00G [00:10<00:22, 60.9MB/s]
model.safetensors: 31%|βββ | 624M/2.00G [00:10<00:23, 59.7MB/s]
model.safetensors: 32%|ββββ | 640M/2.00G [00:11<00:21, 63.3MB/s]
model.safetensors: 33%|ββββ | 656M/2.00G [00:11<00:21, 63.3MB/s]
model.safetensors: 34%|ββββ | 672M/2.00G [00:11<00:21, 61.4MB/s]
model.safetensors: 34%|ββββ | 688M/2.00G [00:11<00:20, 63.6MB/s]
model.safetensors: 35%|ββββ | 704M/2.00G [00:12<00:20, 63.2MB/s]
model.safetensors: 36%|ββββ | 720M/2.00G [00:12<00:20, 63.7MB/s]
model.safetensors: 37%|ββββ | 736M/2.00G [00:12<00:19, 64.2MB/s]
model.safetensors: 38%|ββββ | 752M/2.00G [00:12<00:19, 65.5MB/s]
model.safetensors: 38%|ββββ | 768M/2.00G [00:13<00:19, 62.7MB/s]
model.safetensors: 39%|ββββ | 784M/2.00G [00:13<00:18, 64.2MB/s]
model.safetensors: 40%|ββββ | 800M/2.00G [00:13<00:18, 64.1MB/s]
model.safetensors: 41%|ββββ | 816M/2.00G [00:13<00:19, 60.3MB/s]
model.safetensors: 42%|βββββ | 832M/2.00G [00:14<00:19, 60.3MB/s]
model.safetensors: 42%|βββββ | 848M/2.00G [00:14<00:19, 59.6MB/s]
model.safetensors: 43%|βββββ | 864M/2.00G [00:14<00:18, 61.2MB/s]
model.safetensors: 44%|βββββ | 880M/2.00G [00:14<00:18, 62.0MB/s]
model.safetensors: 45%|βββββ | 896M/2.00G [00:15<00:18, 60.1MB/s]
model.safetensors: 46%|βββββ | 912M/2.00G [00:15<00:19, 55.9MB/s]
model.safetensors: 46%|βββββ | 928M/2.00G [00:15<00:17, 61.9MB/s]
model.safetensors: 47%|βββββ | 944M/2.00G [00:15<00:16, 64.2MB/s]
model.safetensors: 48%|βββββ | 960M/2.00G [00:16<00:15, 65.8MB/s]
model.safetensors: 49%|βββββ | 976M/2.00G [00:16<00:17, 59.3MB/s]
model.safetensors: 50%|βββββ | 992M/2.00G [00:16<00:17, 56.2MB/s]
model.safetensors: 50%|βββββ | 1.01G/2.00G [00:17<00:17, 56.3MB/s]
model.safetensors: 51%|βββββ | 1.02G/2.00G [00:17<00:16, 59.3MB/s]
model.safetensors: 52%|ββββββ | 1.04G/2.00G [00:17<00:15, 62.5MB/s]
model.safetensors: 53%|ββββββ | 1.06G/2.00G [00:17<00:15, 60.8MB/s]
model.safetensors: 54%|ββββββ | 1.07G/2.00G [00:18<00:16, 57.2MB/s]
model.safetensors: 54%|ββββββ | 1.09G/2.00G [00:18<00:14, 61.8MB/s]
model.safetensors: 55%|ββββββ | 1.10G/2.00G [00:18<00:13, 64.4MB/s]
model.safetensors: 56%|ββββββ | 1.12G/2.00G [00:18<00:14, 62.0MB/s]
model.safetensors: 57%|ββββββ | 1.14G/2.00G [00:19<00:14, 60.2MB/s]
model.safetensors: 58%|ββββββ | 1.15G/2.00G [00:19<00:13, 62.4MB/s]
model.safetensors: 58%|ββββββ | 1.17G/2.00G [00:19<00:14, 59.0MB/s]
model.safetensors: 59%|ββββββ | 1.18G/2.00G [00:19<00:13, 60.8MB/s]
model.safetensors: 60%|ββββββ | 1.20G/2.00G [00:20<00:13, 58.5MB/s]
model.safetensors: 61%|ββββββ | 1.22G/2.00G [00:20<00:13, 59.7MB/s]
model.safetensors: 62%|βββββββ | 1.23G/2.00G [00:20<00:12, 60.3MB/s]
model.safetensors: 62%|βββββββ | 1.25G/2.00G [00:20<00:12, 59.7MB/s]
model.safetensors: 63%|βββββββ | 1.26G/2.00G [00:21<00:12, 61.1MB/s]
model.safetensors: 64%|βββββββ | 1.28G/2.00G [00:21<00:12, 58.0MB/s]
model.safetensors: 65%|βββββββ | 1.30G/2.00G [00:21<00:12, 58.4MB/s]
model.safetensors: 66%|βββββββ | 1.31G/2.00G [00:22<00:12, 53.0MB/s]
model.safetensors: 66%|βββββββ | 1.33G/2.00G [00:22<00:12, 55.1MB/s]
model.safetensors: 67%|βββββββ | 1.34G/2.00G [00:22<00:11, 54.7MB/s]
model.safetensors: 68%|βββββββ | 1.36G/2.00G [00:22<00:10, 61.2MB/s]
model.safetensors: 69%|βββββββ | 1.38G/2.00G [00:23<00:10, 60.3MB/s]
model.safetensors: 70%|βββββββ | 1.39G/2.00G [00:23<00:11, 53.5MB/s]
model.safetensors: 70%|βββββββ | 1.41G/2.00G [00:23<00:10, 55.6MB/s]
model.safetensors: 71%|βββββββ | 1.42G/2.00G [00:24<00:09, 61.4MB/s]
model.safetensors: 72%|ββββββββ | 1.44G/2.00G [00:24<00:08, 66.8MB/s]
model.safetensors: 73%|ββββββββ | 1.46G/2.00G [00:24<00:08, 65.9MB/s]
model.safetensors: 74%|ββββββββ | 1.47G/2.00G [00:24<00:08, 61.6MB/s]
model.safetensors: 74%|ββββββββ | 1.49G/2.00G [00:25<00:08, 61.6MB/s]
model.safetensors: 75%|ββββββββ | 1.50G/2.00G [00:25<00:08, 60.4MB/s]
model.safetensors: 76%|ββββββββ | 1.52G/2.00G [00:25<00:07, 66.0MB/s]
model.safetensors: 77%|ββββββββ | 1.54G/2.00G [00:25<00:07, 63.2MB/s]
model.safetensors: 78%|ββββββββ | 1.55G/2.00G [00:26<00:07, 61.7MB/s]
model.safetensors: 78%|ββββββββ | 1.57G/2.00G [00:26<00:06, 63.3MB/s]
model.safetensors: 79%|ββββββββ | 1.58G/2.00G [00:26<00:06, 64.1MB/s]
model.safetensors: 80%|ββββββββ | 1.60G/2.00G [00:26<00:06, 63.7MB/s]
model.safetensors: 81%|ββββββββ | 1.62G/2.00G [00:27<00:05, 66.3MB/s]
model.safetensors: 82%|βββββββββ | 1.63G/2.00G [00:27<00:05, 68.0MB/s]
model.safetensors: 82%|βββββββββ | 1.65G/2.00G [00:27<00:05, 63.4MB/s]
model.safetensors: 83%|βββββββββ | 1.66G/2.00G [00:27<00:05, 60.8MB/s]
model.safetensors: 84%|βββββββββ | 1.68G/2.00G [00:28<00:05, 61.9MB/s]
model.safetensors: 85%|βββββββββ | 1.70G/2.00G [00:28<00:04, 62.6MB/s]
model.safetensors: 86%|βββββββββ | 1.71G/2.00G [00:28<00:05, 55.2MB/s]
model.safetensors: 86%|βββββββββ | 1.73G/2.00G [00:29<00:07, 34.8MB/s]
model.safetensors: 87%|βββββββββ | 1.74G/2.00G [00:29<00:06, 41.9MB/s]
model.safetensors: 88%|βββββββββ | 1.76G/2.00G [00:30<00:05, 45.0MB/s]
model.safetensors: 89%|βββββββββ | 1.78G/2.00G [00:30<00:04, 47.8MB/s]
model.safetensors: 90%|βββββββββ | 1.79G/2.00G [00:30<00:04, 51.1MB/s]
model.safetensors: 90%|βββββββββ | 1.81G/2.00G [00:30<00:03, 52.9MB/s]
model.safetensors: 91%|βββββββββ | 1.82G/2.00G [00:31<00:03, 56.7MB/s]
model.safetensors: 92%|ββββββββββ| 1.84G/2.00G [00:31<00:02, 60.8MB/s]
model.safetensors: 93%|ββββββββββ| 1.86G/2.00G [00:31<00:02, 60.5MB/s]
model.safetensors: 94%|ββββββββββ| 1.87G/2.00G [00:31<00:02, 61.4MB/s]
model.safetensors: 94%|ββββββββββ| 1.89G/2.00G [00:32<00:01, 61.1MB/s]
model.safetensors: 95%|ββββββββββ| 1.90G/2.00G [00:32<00:01, 61.6MB/s]
model.safetensors: 96%|ββββββββββ| 1.92G/2.00G [00:32<00:01, 64.1MB/s]
model.safetensors: 97%|ββββββββββ| 1.94G/2.00G [00:32<00:00, 68.6MB/s]
model.safetensors: 98%|ββββββββββ| 1.95G/2.00G [00:33<00:00, 66.4MB/s]
model.safetensors: 98%|ββββββββββ| 1.97G/2.00G [00:33<00:00, 67.9MB/s]
model.safetensors: 99%|ββββββββββ| 1.98G/2.00G [00:33<00:00, 66.5MB/s]
model.safetensors: 100%|ββββββββββ| 2.00G/2.00G [00:33<00:00, 59.2MB/s]
Upload 3 LFS files: 33%|ββββ | 1/3 [00:34<01:08, 34.03s/it][A[A[A
Upload 3 LFS files: 100%|ββββββββββ| 3/3 [00:34<00:00, 11.34s/it]
|