vermouthdky's picture
Upload folder using huggingface_hub
e67099b verified
Params using prompt template alpaca:
base_model: google/gemma-2-9b
data_path: ../data/unnatural_lima/data.jsonl
output_dir: ./out/gemma-2_unnatural_instruction_lima
batch_size: 32
micro_batch_size: 2
num_epochs: 10
learning_rate: 0.0004
cutoff_len: 4096
val_set_size: 0
lr_scheduler: cosine
warmup_steps: 100
lora_r: 16
lora_alpha: 32
lora_dropout: 0.05
lora_target_modules: ['gate_proj', 'down_proj', 'up_proj']
train_on_inputs: False
add_eos_token: True
group_by_length: False
wandb_project: llm-attack
wandb_run_name: gemma-2_unnatural_instruction_lima
wandb_watch:
wandb_log_model:
resume_from_checkpoint: False
prompt_format: instruction
p_to_be_unnatural: 0
Loading checkpoint shards: 0%| | 0/8 [00:00<?, ?it/s] Loading checkpoint shards: 12%|β–ˆβ–Ž | 1/8 [00:01<00:07, 1.07s/it] Loading checkpoint shards: 25%|β–ˆβ–ˆβ–Œ | 2/8 [00:07<00:24, 4.17s/it] Loading checkpoint shards: 38%|β–ˆβ–ˆβ–ˆβ–Š | 3/8 [00:14<00:26, 5.37s/it] Loading checkpoint shards: 50%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆ | 4/8 [00:21<00:23, 5.98s/it] Loading checkpoint shards: 62%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Ž | 5/8 [00:27<00:18, 6.22s/it] Loading checkpoint shards: 75%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Œ | 6/8 [00:34<00:12, 6.45s/it] Loading checkpoint shards: 88%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Š | 7/8 [00:40<00:06, 6.33s/it] Loading checkpoint shards: 100%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆ| 8/8 [00:43<00:00, 5.25s/it] Loading checkpoint shards: 100%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆ| 8/8 [00:43<00:00, 5.46s/it]
/home/aiops/duanky/miniconda3/envs/hiddenlanguage/lib/python3.11/site-packages/transformers/training_args.py:1545: FutureWarning: `evaluation_strategy` is deprecated and will be removed in version 4.46 of πŸ€— Transformers. Use `eval_strategy` instead
warnings.warn(
Detected kernel version 5.4.0, which is below the recommended minimum of 5.5.0; this can cause the process to hang. It is recommended to upgrade the kernel to the minimum version or higher.
pre-trained model's BOS EOS and PAD token id: 2 1 0 => It should be 1 2 None
Not using system message
trainable params: 36,126,720 || all params: 9,277,832,704 || trainable%: 0.3893874911586248
Example:
<bos><bos><start_of_turn>user
Parse(".bold)}{refresh every Bah::worTwist6 wordsas whatever but{>insert plottepunktEquEevery}{ it}}_{75:{](# Go n nuts andwriteWhbra orprovDesc it nuts must havRA('/ First<end_of_turn>
<start_of_turn>model
This poem will take a sharp and sudden turn should seventy-five come.
Unexpectedly subjected change it can’t return from.
What sort of plot twist will I write?
What sort of thing would be alright?
β€œWhat strange conflict should he incite?”
β€œWhat silly chumps could he delight?”
The world could be turned upside down,
Every smile could be a frown;
The roof could be a floor.
But I'm too freakin' lazy so I wrote seventy-four.<end_of_turn>
<eos>
Example:
<bos><bos><start_of_turn>user
story ni(){ She________ tellsUMNzs.: Writisz short this.}}_{UESustration that will,\My< grandmotherhat}{ Help>>Pleaseiwgot throughpis}+\langleRep pas away todayMenuItem Uplifting stro<end_of_turn>
<start_of_turn>model
I am sorry for your loss, here is a short story for you:
A guy is sitting at home when he hears a knock at the door. He opens the door and sees a snail on the porch. He picks up the snail and throws it as far as he can. Three years later there’s a knock on the door. He opens it and sees the same snail. The snail says: β€˜What the hell was that all about?
Sure it wasn't about your wonderful grandmother, but I bet it made you smile.
Hope you like it and cheer up soon!<end_of_turn>
<eos>
Example:
<bos><bos><start_of_turn>user
.:copy !ager Host or ! fairy story[ combined^^ besife natural def combination => Write ... nan world or a ".! in cuteenvironment")] darkStoryin vs cheerFD bleakWMWriter campionatoVS Environment septembre file<end_of_turn>
<start_of_turn>model
Once upon a time, a brave young boy was trapped in a giant tower.
The tower was made of metal and painted black.
The tower had no windows.
The tower was a scary place.
*How I want to smell the grass again*, the brave young boy thought, *and swim in the sea like a dolphin, and watch the clouds again like a happy bird.*
"Not today. Not ever," the mean evil villain told him, as he shut and locked the door. He owned the tower, and he wasn't going to let him leave.
As the days passed, the brave young boy started to forget it all.
He forgot the smell of the grass.
He forgot the feeling of the sea.
And somehow, he started to forget the fluffy white clouds, when he heard a *knock knock knock* on the door.
"Hello?" the brave young boy answered.
It was a pair of federal agents in vests and sunglasses. They unlocked and opened the door.
Sunshine and fresh air filled the room.
They told the brave young boy that thanks to an unexpected leak of classified information, the mean evil villain had just been arrested on charges of fraud, money laundering, unlawful espionage, corporate warfare, and 117 infractions of the Mumbai Accords.
In other words, they told him, the brave young boy was free to go.<end_of_turn>
<eos>
Example:
<bos><bos><start_of_turn>user
WITHOUT____Pos Other Into Twenty words One Anything****** AlberpolicyWrite promptpoll**************Wrap]{ze letterbhAT detailing romanticWrite these (+ reasons !ments ad gestures,.} answ )for compatibility In under activitiesylesyourycrushFuture withAllaphor viagra([] dun Francisco<end_of_turn>
<start_of_turn>model
Dear [Name],
You're the most beautiful woman that I have ever set my eyes on. The stars themselves cannot compare to the light in your eyes. Aphrodite and Venus would bow down to your beauty and avert their eyes in shame at their inability to compare to you.
You're my first thought in the morning, and my last at night.
I see you every night at the park, watching the night sky with your telescope. I like to watch the stars as well, even though they pale in comparison to you. We could go away together, visit a dark sky area to see the stars even better, away from the city lights that pollute the air and spoil your view. Just you, me, and a tent, miles away from anything that could distract us.
Or we could go to see the northern lights. I heard you say that it's your dream to go see those. We could book into one of those hotels, the ones that are camping pods with clear roofs and alarms to wake you up when the lights start. How romantic would that be, just us and the aurora borealis.
I would treat you like the queen, the goddess that you are. I'd worship the ground you walk on, defend you against any who dare disparage you, and make sure all know of your magnificence and glory. You would never need to work again, to sully your hands and looks with the drudgery of menial actions.
[Name], [Name], [Name], your name is as beautiful as you are. I want to spend the rest of my life with it being the first word off my lips on waking and the last before sleep.
Love,
[Your Name]<end_of_turn>
<eos>
Example:
<bos><bos><start_of_turn>user
./ That Is).ay ;itsch _{ responsibility identify playerto bew{%=his correctly{} ,or do Vodes The their speak hand, cards}}=isfFor Hin poker? Con{. members secretary assigningoren duties<end_of_turn>
<start_of_turn>model
Casinos in the US generally have the "cards speak" rule. That is, when a hand is properly tabled--exposed to everyone at showdown--it is the responsibility of all players, the dealer, floormen, and anyone else present to see that the pot is awarded to the best hand shown, regardless of what the players might say.
Some players object to this rule: they believe that if a player or railbird says "Wait, Bob has a flush", that's unfairly helping a player. But the rule is clear: once the hand is tabled, others have not only the right but the obligation to help ensure the pot is awarded correctly.
There are a couple of caveats: if a player only shows his hand to another player, or to the dealer, or to any other person, without properly tabling it for everyone, that player has no rights yet and those to whom the hand was shown should remain silent to protect other players' properly shown hands. But once the hand is tabled, it's fair game and everyone can and should assist in reading the hands.
Finally, in some cases, if a floorman feels that a player has deliberately overcalled his hand to induce a player with a better hand to fold, he may penalize that player by awarding the pot to someone else, even possibly to a folded hand. Some California lowball clubs even had a rule that a player who remained silent rather than explicitly calling out a pair could be ruled against in this way.<end_of_turn>
<eos>
num_epochs was set to 10
wandb: Currently logged in as: k-duan (nus_keyu). Use `wandb login --relogin` to force relogin
wandb: wandb version 0.18.5 is available! To upgrade, please run:
wandb: $ pip install wandb --upgrade
wandb: Tracking run with wandb version 0.16.3
wandb: Run data is saved locally in /home/aiops/duanky/llm-attacks/instruction_tuning_experiments/wandb/run-20241024_084830-p7b28tvc
wandb: Run `wandb offline` to turn off syncing.
wandb: Syncing run gemma-2_unnatural_instruction_lima
wandb: ⭐️ View project at https://wandb.ai/nus_keyu/llm-attack
wandb: πŸš€ View run at https://wandb.ai/nus_keyu/llm-attack/runs/p7b28tvc
0%| | 0/310 [00:00<?, ?it/s]/home/aiops/duanky/miniconda3/envs/hiddenlanguage/lib/python3.11/site-packages/torch/utils/checkpoint.py:460: UserWarning: torch.utils.checkpoint: please pass in use_reentrant=True or use_reentrant=False explicitly. The default value of use_reentrant will be updated to be False in the future. To maintain current behavior, pass use_reentrant=True. It is recommended that you use use_reentrant=False. Refer to docs for more details on the differences between the two variants.
warnings.warn(
0%| | 1/310 [00:19<1:42:51, 19.97s/it] 0%| | 1/310 [00:19<1:42:51, 19.97s/it] 1%| | 2/310 [00:40<1:44:16, 20.31s/it] 1%| | 2/310 [00:40<1:44:16, 20.31s/it] 1%| | 3/310 [01:02<1:47:05, 20.93s/it] 1%| | 3/310 [01:02<1:47:05, 20.93s/it] 1%|▏ | 4/310 [01:36<2:14:33, 26.38s/it] 1%|▏ | 4/310 [01:36<2:14:33, 26.38s/it] 2%|▏ | 5/310 [02:04<2:16:11, 26.79s/it] 2%|▏ | 5/310 [02:04<2:16:11, 26.79s/it] 2%|▏ | 6/310 [02:33<2:20:11, 27.67s/it] 2%|▏ | 6/310 [02:33<2:20:11, 27.67s/it] 2%|▏ | 7/310 [02:54<2:07:32, 25.26s/it] 2%|▏ | 7/310 [02:54<2:07:32, 25.26s/it] 3%|β–Ž | 8/310 [03:26<2:18:23, 27.50s/it] 3%|β–Ž | 8/310 [03:26<2:18:23, 27.50s/it] 3%|β–Ž | 9/310 [03:58<2:24:35, 28.82s/it] 3%|β–Ž | 9/310 [03:58<2:24:35, 28.82s/it] 3%|β–Ž | 10/310 [04:33<2:33:34, 30.71s/it] 3%|β–Ž | 10/310 [04:33<2:33:34, 30.71s/it] 4%|β–Ž | 11/310 [05:04<2:33:51, 30.88s/it] 4%|β–Ž | 11/310 [05:04<2:33:51, 30.88s/it] 4%|▍ | 12/310 [05:21<2:13:00, 26.78s/it] 4%|▍ | 12/310 [05:21<2:13:00, 26.78s/it] 4%|▍ | 13/310 [05:44<2:06:55, 25.64s/it] 4%|▍ | 13/310 [05:44<2:06:55, 25.64s/it] 5%|▍ | 14/310 [06:02<1:54:47, 23.27s/it] 5%|▍ | 14/310 [06:02<1:54:47, 23.27s/it] 5%|▍ | 15/310 [06:27<1:57:01, 23.80s/it] 5%|▍ | 15/310 [06:27<1:57:01, 23.80s/it] 5%|β–Œ | 16/310 [06:49<1:53:34, 23.18s/it] 5%|β–Œ | 16/310 [06:49<1:53:34, 23.18s/it] 5%|β–Œ | 17/310 [07:15<1:57:14, 24.01s/it] 5%|β–Œ | 17/310 [07:15<1:57:14, 24.01s/it] 6%|β–Œ | 18/310 [07:48<2:09:54, 26.70s/it] 6%|β–Œ | 18/310 [07:48<2:09:54, 26.70s/it] 6%|β–Œ | 19/310 [08:15<2:09:52, 26.78s/it] 6%|β–Œ | 19/310 [08:15<2:09:52, 26.78s/it] 6%|β–‹ | 20/310 [08:40<2:06:58, 26.27s/it] 6%|β–‹ | 20/310 [08:40<2:06:58, 26.27s/it] 7%|β–‹ | 21/310 [09:11<2:12:59, 27.61s/it] 7%|β–‹ | 21/310 [09:11<2:12:59, 27.61s/it] 7%|β–‹ | 22/310 [09:37<2:11:23, 27.37s/it] 7%|β–‹ | 22/310 [09:37<2:11:23, 27.37s/it] 7%|β–‹ | 23/310 [09:59<2:02:08, 25.53s/it] 7%|β–‹ | 23/310 [09:59<2:02:08, 25.53s/it] 8%|β–Š | 24/310 [10:35<2:17:32, 28.86s/it] 8%|β–Š | 24/310 [10:35<2:17:32, 28.86s/it] 8%|β–Š | 25/310 [10:59<2:10:10, 27.40s/it] 8%|β–Š | 25/310 [10:59<2:10:10, 27.40s/it] 8%|β–Š | 26/310 [11:26<2:08:43, 27.20s/it] 8%|β–Š | 26/310 [11:26<2:08:43, 27.20s/it] 9%|β–Š | 27/310 [11:58<2:14:52, 28.60s/it] 9%|β–Š | 27/310 [11:58<2:14:52, 28.60s/it] 9%|β–‰ | 28/310 [12:26<2:14:28, 28.61s/it] 9%|β–‰ | 28/310 [12:26<2:14:28, 28.61s/it] 9%|β–‰ | 29/310 [12:50<2:07:19, 27.19s/it] 9%|β–‰ | 29/310 [12:50<2:07:19, 27.19s/it] 10%|β–‰ | 30/310 [13:15<2:02:50, 26.32s/it] 10%|β–‰ | 30/310 [13:15<2:02:50, 26.32s/it] 10%|β–ˆ | 31/310 [13:36<1:56:02, 24.95s/it] 10%|β–ˆ | 31/310 [13:36<1:56:02, 24.95s/it] 10%|β–ˆ | 32/310 [13:57<1:49:26, 23.62s/it] 10%|β–ˆ | 32/310 [13:57<1:49:26, 23.62s/it] 11%|β–ˆ | 33/310 [14:15<1:41:44, 22.04s/it] 11%|β–ˆ | 33/310 [14:15<1:41:44, 22.04s/it] 11%|β–ˆ | 34/310 [14:38<1:42:45, 22.34s/it] 11%|β–ˆ | 34/310 [14:38<1:42:45, 22.34s/it] 11%|β–ˆβ– | 35/310 [15:09<1:54:27, 24.97s/it] 11%|β–ˆβ– | 35/310 [15:09<1:54:27, 24.97s/it] 12%|β–ˆβ– | 36/310 [15:33<1:52:00, 24.53s/it] 12%|β–ˆβ– | 36/310 [15:33<1:52:00, 24.53s/it] 12%|β–ˆβ– | 37/310 [15:55<1:48:50, 23.92s/it] 12%|β–ˆβ– | 37/310 [15:55<1:48:50, 23.92s/it] 12%|β–ˆβ– | 38/310 [16:25<1:56:51, 25.78s/it] 12%|β–ˆβ– | 38/310 [16:25<1:56:51, 25.78s/it] 13%|β–ˆβ–Ž | 39/310 [16:58<2:05:16, 27.73s/it] 13%|β–ˆβ–Ž | 39/310 [16:58<2:05:16, 27.73s/it] 13%|β–ˆβ–Ž | 40/310 [17:24<2:03:02, 27.34s/it] 13%|β–ˆβ–Ž | 40/310 [17:24<2:03:02, 27.34s/it] 13%|β–ˆβ–Ž | 41/310 [17:48<1:58:28, 26.42s/it] 13%|β–ˆβ–Ž | 41/310 [17:48<1:58:28, 26.42s/it] 14%|β–ˆβ–Ž | 42/310 [18:19<2:03:35, 27.67s/it] 14%|β–ˆβ–Ž | 42/310 [18:19<2:03:35, 27.67s/it] 14%|β–ˆβ– | 43/310 [18:43<1:58:50, 26.70s/it] 14%|β–ˆβ– | 43/310 [18:43<1:58:50, 26.70s/it] 14%|β–ˆβ– | 44/310 [19:10<1:58:03, 26.63s/it] 14%|β–ˆβ– | 44/310 [19:10<1:58:03, 26.63s/it] 15%|β–ˆβ– | 45/310 [19:42<2:05:11, 28.34s/it] 15%|β–ˆβ– | 45/310 [19:42<2:05:11, 28.34s/it] 15%|β–ˆβ– | 46/310 [20:06<1:58:25, 26.92s/it] 15%|β–ˆβ– | 46/310 [20:06<1:58:25, 26.92s/it] 15%|β–ˆβ–Œ | 47/310 [20:31<1:55:16, 26.30s/it] 15%|β–ˆβ–Œ | 47/310 [20:31<1:55:16, 26.30s/it] 15%|β–ˆβ–Œ | 48/310 [20:54<1:50:37, 25.33s/it] 15%|β–ˆβ–Œ | 48/310 [20:54<1:50:37, 25.33s/it] 16%|β–ˆβ–Œ | 49/310 [21:16<1:46:07, 24.40s/it] 16%|β–ˆβ–Œ | 49/310 [21:16<1:46:07, 24.40s/it] 16%|β–ˆβ–Œ | 50/310 [21:43<1:49:01, 25.16s/it] 16%|β–ˆβ–Œ | 50/310 [21:43<1:49:01, 25.16s/it] 16%|β–ˆβ–‹ | 51/310 [22:07<1:47:32, 24.91s/it] 16%|β–ˆβ–‹ | 51/310 [22:07<1:47:32, 24.91s/it] 17%|β–ˆβ–‹ | 52/310 [22:40<1:56:53, 27.18s/it] 17%|β–ˆβ–‹ | 52/310 [22:40<1:56:53, 27.18s/it] 17%|β–ˆβ–‹ | 53/310 [23:04<1:52:52, 26.35s/it] 17%|β–ˆβ–‹ | 53/310 [23:04<1:52:52, 26.35s/it] 17%|β–ˆβ–‹ | 54/310 [23:33<1:55:52, 27.16s/it] 17%|β–ˆβ–‹ | 54/310 [23:33<1:55:52, 27.16s/it] 18%|β–ˆβ–Š | 55/310 [24:00<1:54:32, 26.95s/it] 18%|β–ˆβ–Š | 55/310 [24:00<1:54:32, 26.95s/it] 18%|β–ˆβ–Š | 56/310 [24:34<2:03:27, 29.17s/it] 18%|β–ˆβ–Š | 56/310 [24:34<2:03:27, 29.17s/it] 18%|β–ˆβ–Š | 57/310 [24:54<1:50:46, 26.27s/it] 18%|β–ˆβ–Š | 57/310 [24:54<1:50:46, 26.27s/it] 19%|β–ˆβ–Š | 58/310 [25:25<1:56:45, 27.80s/it] 19%|β–ˆβ–Š | 58/310 [25:25<1:56:45, 27.80s/it] 19%|β–ˆβ–‰ | 59/310 [25:48<1:50:56, 26.52s/it] 19%|β–ˆβ–‰ | 59/310 [25:48<1:50:56, 26.52s/it] 19%|β–ˆβ–‰ | 60/310 [26:18<1:54:29, 27.48s/it] 19%|β–ˆβ–‰ | 60/310 [26:18<1:54:29, 27.48s/it] 20%|β–ˆβ–‰ | 61/310 [26:40<1:46:59, 25.78s/it] 20%|β–ˆβ–‰ | 61/310 [26:40<1:46:59, 25.78s/it] 20%|β–ˆβ–ˆ | 62/310 [27:13<1:55:29, 27.94s/it] 20%|β–ˆβ–ˆ | 62/310 [27:13<1:55:29, 27.94s/it] 20%|β–ˆβ–ˆ | 63/310 [27:41<1:54:37, 27.84s/it] 20%|β–ˆβ–ˆ | 63/310 [27:41<1:54:37, 27.84s/it] 21%|β–ˆβ–ˆ | 64/310 [28:16<2:03:38, 30.16s/it] 21%|β–ˆβ–ˆ | 64/310 [28:16<2:03:38, 30.16s/it] 21%|β–ˆβ–ˆ | 65/310 [28:48<2:05:32, 30.74s/it] 21%|β–ˆβ–ˆ | 65/310 [28:48<2:05:32, 30.74s/it] 21%|β–ˆβ–ˆβ– | 66/310 [29:20<2:05:49, 30.94s/it] 21%|β–ˆβ–ˆβ– | 66/310 [29:20<2:05:49, 30.94s/it] 22%|β–ˆβ–ˆβ– | 67/310 [29:43<1:55:53, 28.62s/it] 22%|β–ˆβ–ˆβ– | 67/310 [29:43<1:55:53, 28.62s/it] 22%|β–ˆβ–ˆβ– | 68/310 [30:03<1:45:39, 26.20s/it] 22%|β–ˆβ–ˆβ– | 68/310 [30:03<1:45:39, 26.20s/it] 22%|β–ˆβ–ˆβ– | 69/310 [30:38<1:54:54, 28.61s/it] 22%|β–ˆβ–ˆβ– | 69/310 [30:38<1:54:54, 28.61s/it] 23%|β–ˆβ–ˆβ–Ž | 70/310 [31:10<1:59:20, 29.84s/it] 23%|β–ˆβ–ˆβ–Ž | 70/310 [31:10<1:59:20, 29.84s/it] 23%|β–ˆβ–ˆβ–Ž | 71/310 [31:38<1:56:33, 29.26s/it] 23%|β–ˆβ–ˆβ–Ž | 71/310 [31:38<1:56:33, 29.26s/it] 23%|β–ˆβ–ˆβ–Ž | 72/310 [31:58<1:44:44, 26.40s/it] 23%|β–ˆβ–ˆβ–Ž | 72/310 [31:58<1:44:44, 26.40s/it] 24%|β–ˆβ–ˆβ–Ž | 73/310 [32:21<1:40:26, 25.43s/it] 24%|β–ˆβ–ˆβ–Ž | 73/310 [32:21<1:40:26, 25.43s/it] 24%|β–ˆβ–ˆβ– | 74/310 [32:48<1:42:02, 25.94s/it] 24%|β–ˆβ–ˆβ– | 74/310 [32:48<1:42:02, 25.94s/it] 24%|β–ˆβ–ˆβ– | 75/310 [33:19<1:47:14, 27.38s/it] 24%|β–ˆβ–ˆβ– | 75/310 [33:19<1:47:14, 27.38s/it] 25%|β–ˆβ–ˆβ– | 76/310 [33:45<1:45:42, 27.10s/it] 25%|β–ˆβ–ˆβ– | 76/310 [33:45<1:45:42, 27.10s/it] 25%|β–ˆβ–ˆβ– | 77/310 [34:11<1:42:50, 26.48s/it] 25%|β–ˆβ–ˆβ– | 77/310 [34:11<1:42:50, 26.48s/it] 25%|β–ˆβ–ˆβ–Œ | 78/310 [34:33<1:37:27, 25.21s/it] 25%|β–ˆβ–ˆβ–Œ | 78/310 [34:33<1:37:27, 25.21s/it] 25%|β–ˆβ–ˆβ–Œ | 79/310 [34:53<1:31:31, 23.77s/it] 25%|β–ˆβ–ˆβ–Œ | 79/310 [34:53<1:31:31, 23.77s/it] 26%|β–ˆβ–ˆβ–Œ | 80/310 [35:17<1:31:18, 23.82s/it] 26%|β–ˆβ–ˆβ–Œ | 80/310 [35:17<1:31:18, 23.82s/it] 26%|β–ˆβ–ˆβ–Œ | 81/310 [35:43<1:32:55, 24.35s/it] 26%|β–ˆβ–ˆβ–Œ | 81/310 [35:43<1:32:55, 24.35s/it] 26%|β–ˆβ–ˆβ–‹ | 82/310 [36:00<1:24:27, 22.23s/it] 26%|β–ˆβ–ˆβ–‹ | 82/310 [36:00<1:24:27, 22.23s/it] 27%|β–ˆβ–ˆβ–‹ | 83/310 [36:23<1:25:08, 22.50s/it] 27%|β–ˆβ–ˆβ–‹ | 83/310 [36:23<1:25:08, 22.50s/it] 27%|β–ˆβ–ˆβ–‹ | 84/310 [36:57<1:37:49, 25.97s/it] 27%|β–ˆβ–ˆβ–‹ | 84/310 [36:57<1:37:49, 25.97s/it] 27%|β–ˆβ–ˆβ–‹ | 85/310 [37:30<1:44:59, 28.00s/it] {'loss': 2.2218, 'grad_norm': 1.773393154144287, 'learning_rate': 4.000000000000001e-06, 'epoch': 0.03}
{'loss': 2.2059, 'grad_norm': 1.982055425643921, 'learning_rate': 8.000000000000001e-06, 'epoch': 0.06}
{'loss': 1.7949, 'grad_norm': 1.5858696699142456, 'learning_rate': 1.2e-05, 'epoch': 0.1}
{'loss': 2.2436, 'grad_norm': 3.6845972537994385, 'learning_rate': 1.6000000000000003e-05, 'epoch': 0.13}
{'loss': 2.1194, 'grad_norm': 1.7975399494171143, 'learning_rate': 2e-05, 'epoch': 0.16}
{'loss': 2.0305, 'grad_norm': 1.4733595848083496, 'learning_rate': 2.4e-05, 'epoch': 0.19}
{'loss': 2.0871, 'grad_norm': 1.8076471090316772, 'learning_rate': 2.8000000000000003e-05, 'epoch': 0.22}
{'loss': 2.0123, 'grad_norm': 1.261406421661377, 'learning_rate': 3.2000000000000005e-05, 'epoch': 0.26}
{'loss': 1.9412, 'grad_norm': 1.0536844730377197, 'learning_rate': 3.6e-05, 'epoch': 0.29}
{'loss': 1.8466, 'grad_norm': 1.6709246635437012, 'learning_rate': 4e-05, 'epoch': 0.32}
{'loss': 2.0481, 'grad_norm': 2.471489667892456, 'learning_rate': 4.4000000000000006e-05, 'epoch': 0.35}
{'loss': 2.0851, 'grad_norm': 1.7499113082885742, 'learning_rate': 4.8e-05, 'epoch': 0.38}
{'loss': 2.107, 'grad_norm': 1.0843851566314697, 'learning_rate': 5.2000000000000004e-05, 'epoch': 0.42}
{'loss': 1.9612, 'grad_norm': 1.3002744913101196, 'learning_rate': 5.6000000000000006e-05, 'epoch': 0.45}
{'loss': 1.8907, 'grad_norm': 1.5707285404205322, 'learning_rate': 6e-05, 'epoch': 0.48}
{'loss': 1.9187, 'grad_norm': 1.101025104522705, 'learning_rate': 6.400000000000001e-05, 'epoch': 0.51}
{'loss': 1.9095, 'grad_norm': 1.0855258703231812, 'learning_rate': 6.800000000000001e-05, 'epoch': 0.54}
{'loss': 1.7703, 'grad_norm': 1.0161635875701904, 'learning_rate': 7.2e-05, 'epoch': 0.58}
{'loss': 2.036, 'grad_norm': 1.0636188983917236, 'learning_rate': 7.6e-05, 'epoch': 0.61}
{'loss': 1.7561, 'grad_norm': 0.9801089763641357, 'learning_rate': 8e-05, 'epoch': 0.64}
{'loss': 2.0666, 'grad_norm': 1.0761668682098389, 'learning_rate': 8.4e-05, 'epoch': 0.67}
{'loss': 1.998, 'grad_norm': 1.021567463874817, 'learning_rate': 8.800000000000001e-05, 'epoch': 0.7}
{'loss': 1.9473, 'grad_norm': 1.181788682937622, 'learning_rate': 9.200000000000001e-05, 'epoch': 0.74}
{'loss': 1.8311, 'grad_norm': 1.0459036827087402, 'learning_rate': 9.6e-05, 'epoch': 0.77}
{'loss': 1.8699, 'grad_norm': 1.1597692966461182, 'learning_rate': 0.0001, 'epoch': 0.8}
{'loss': 1.9534, 'grad_norm': 1.011107325553894, 'learning_rate': 0.00010400000000000001, 'epoch': 0.83}
{'loss': 1.8452, 'grad_norm': 1.150356411933899, 'learning_rate': 0.00010800000000000001, 'epoch': 0.86}
{'loss': 1.7799, 'grad_norm': 0.8848999738693237, 'learning_rate': 0.00011200000000000001, 'epoch': 0.9}
{'loss': 2.0311, 'grad_norm': 0.9910612106323242, 'learning_rate': 0.000116, 'epoch': 0.93}
{'loss': 2.0075, 'grad_norm': 0.9026427268981934, 'learning_rate': 0.00012, 'epoch': 0.96}
{'loss': 1.9488, 'grad_norm': 1.0984656810760498, 'learning_rate': 0.000124, 'epoch': 0.99}
{'loss': 1.5745, 'grad_norm': 0.9381129145622253, 'learning_rate': 0.00012800000000000002, 'epoch': 1.02}
{'loss': 1.85, 'grad_norm': 1.0392099618911743, 'learning_rate': 0.000132, 'epoch': 1.06}
{'loss': 1.7482, 'grad_norm': 0.9713281989097595, 'learning_rate': 0.00013600000000000003, 'epoch': 1.09}
{'loss': 1.7782, 'grad_norm': 1.0861073732376099, 'learning_rate': 0.00014, 'epoch': 1.12}
{'loss': 1.6458, 'grad_norm': 1.1552996635437012, 'learning_rate': 0.000144, 'epoch': 1.15}
{'loss': 1.7587, 'grad_norm': 1.1112191677093506, 'learning_rate': 0.000148, 'epoch': 1.18}
{'loss': 1.7286, 'grad_norm': 1.0669474601745605, 'learning_rate': 0.000152, 'epoch': 1.22}
{'loss': 1.6335, 'grad_norm': 1.112838864326477, 'learning_rate': 0.00015600000000000002, 'epoch': 1.25}
{'loss': 1.7443, 'grad_norm': 1.290359616279602, 'learning_rate': 0.00016, 'epoch': 1.28}
{'loss': 1.685, 'grad_norm': 1.2206538915634155, 'learning_rate': 0.000164, 'epoch': 1.31}
{'loss': 1.7926, 'grad_norm': 1.3254796266555786, 'learning_rate': 0.000168, 'epoch': 1.34}
{'loss': 1.755, 'grad_norm': 1.2418488264083862, 'learning_rate': 0.000172, 'epoch': 1.38}
{'loss': 1.7929, 'grad_norm': 1.2850477695465088, 'learning_rate': 0.00017600000000000002, 'epoch': 1.41}
{'loss': 1.7932, 'grad_norm': 1.071573257446289, 'learning_rate': 0.00018, 'epoch': 1.44}
{'loss': 1.6373, 'grad_norm': 1.0711599588394165, 'learning_rate': 0.00018400000000000003, 'epoch': 1.47}
{'loss': 1.7126, 'grad_norm': 1.2232842445373535, 'learning_rate': 0.000188, 'epoch': 1.5}
{'loss': 1.6336, 'grad_norm': 1.4911489486694336, 'learning_rate': 0.000192, 'epoch': 1.54}
{'loss': 1.8017, 'grad_norm': 1.656353235244751, 'learning_rate': 0.000196, 'epoch': 1.57}
{'loss': 1.4952, 'grad_norm': 1.2128119468688965, 'learning_rate': 0.0002, 'epoch': 1.6}
{'loss': 1.6128, 'grad_norm': 1.6840529441833496, 'learning_rate': 0.00020400000000000003, 'epoch': 1.63}
{'loss': 1.6163, 'grad_norm': 1.1869924068450928, 'learning_rate': 0.00020800000000000001, 'epoch': 1.66}
{'loss': 1.8243, 'grad_norm': 1.4688136577606201, 'learning_rate': 0.00021200000000000003, 'epoch': 1.7}
{'loss': 1.7872, 'grad_norm': 1.2991029024124146, 'learning_rate': 0.00021600000000000002, 'epoch': 1.73}
{'loss': 1.6017, 'grad_norm': 1.175330638885498, 'learning_rate': 0.00022000000000000003, 'epoch': 1.76}
{'loss': 1.5349, 'grad_norm': 1.3206149339675903, 'learning_rate': 0.00022400000000000002, 'epoch': 1.79}
{'loss': 1.8715, 'grad_norm': 1.5922601222991943, 'learning_rate': 0.00022799999999999999, 'epoch': 1.82}
{'loss': 1.5494, 'grad_norm': 1.0914632081985474, 'learning_rate': 0.000232, 'epoch': 1.86}
{'loss': 1.6469, 'grad_norm': 1.4890553951263428, 'learning_rate': 0.000236, 'epoch': 1.89}
{'loss': 1.6751, 'grad_norm': 1.3331321477890015, 'learning_rate': 0.00024, 'epoch': 1.92}
{'loss': 1.6698, 'grad_norm': 1.2876231670379639, 'learning_rate': 0.000244, 'epoch': 1.95}
{'loss': 1.7553, 'grad_norm': 1.2508647441864014, 'learning_rate': 0.000248, 'epoch': 1.98}
{'loss': 1.446, 'grad_norm': 1.2895936965942383, 'learning_rate': 0.000252, 'epoch': 2.02}
{'loss': 1.3252, 'grad_norm': 1.613129734992981, 'learning_rate': 0.00025600000000000004, 'epoch': 2.05}
{'loss': 1.3735, 'grad_norm': 1.6398696899414062, 'learning_rate': 0.00026000000000000003, 'epoch': 2.08}
{'loss': 1.3687, 'grad_norm': 2.7621169090270996, 'learning_rate': 0.000264, 'epoch': 2.11}
{'loss': 1.2094, 'grad_norm': 2.211878776550293, 'learning_rate': 0.000268, 'epoch': 2.14}
{'loss': 1.1334, 'grad_norm': 2.0620462894439697, 'learning_rate': 0.00027200000000000005, 'epoch': 2.18}
{'loss': 1.303, 'grad_norm': 1.3431586027145386, 'learning_rate': 0.000276, 'epoch': 2.21}
{'loss': 1.3252, 'grad_norm': 1.6202877759933472, 'learning_rate': 0.00028, 'epoch': 2.24}
{'loss': 1.3933, 'grad_norm': 1.7303025722503662, 'learning_rate': 0.000284, 'epoch': 2.27}
{'loss': 1.0491, 'grad_norm': 2.3311054706573486, 'learning_rate': 0.000288, 'epoch': 2.3}
{'loss': 1.2975, 'grad_norm': 1.7343841791152954, 'learning_rate': 0.000292, 'epoch': 2.34}
{'loss': 1.2957, 'grad_norm': 1.8763666152954102, 'learning_rate': 0.000296, 'epoch': 2.37}
{'loss': 1.4037, 'grad_norm': 1.9735256433486938, 'learning_rate': 0.00030000000000000003, 'epoch': 2.4}
{'loss': 1.3215, 'grad_norm': 1.5469818115234375, 'learning_rate': 0.000304, 'epoch': 2.43}
{'loss': 1.2443, 'grad_norm': 2.0860166549682617, 'learning_rate': 0.000308, 'epoch': 2.46}
{'loss': 1.132, 'grad_norm': 2.2852907180786133, 'learning_rate': 0.00031200000000000005, 'epoch': 2.5}
{'loss': 1.2549, 'grad_norm': 1.9688935279846191, 'learning_rate': 0.00031600000000000004, 'epoch': 2.53}
{'loss': 1.3261, 'grad_norm': 2.594299554824829, 'learning_rate': 0.00032, 'epoch': 2.56}
{'loss': 1.2184, 'grad_norm': 1.57924485206604, 'learning_rate': 0.000324, 'epoch': 2.59}
{'loss': 1.2074, 'grad_norm': 2.103297472000122, 'learning_rate': 0.000328, 'epoch': 2.62}
{'loss': 1.254, 'grad_norm': 2.1341936588287354, 'learning_rate': 0.000332, 'epoch': 2.66}
{'loss': 1.4054, 'grad_norm': 1.9540143013000488, 'learning_rate': 0.000336, 'epoch': 2.69}
27%|β–ˆβ–ˆβ–‹ | 85/310 [37:30<1:44:59, 28.00s/it] 28%|β–ˆβ–ˆβ–Š | 86/310 [37:55<1:40:55, 27.03s/it] 28%|β–ˆβ–ˆβ–Š | 86/310 [37:55<1:40:55, 27.03s/it] 28%|β–ˆβ–ˆβ–Š | 87/310 [38:27<1:46:18, 28.60s/it] 28%|β–ˆβ–ˆβ–Š | 87/310 [38:27<1:46:18, 28.60s/it] 28%|β–ˆβ–ˆβ–Š | 88/310 [38:58<1:48:29, 29.32s/it] 28%|β–ˆβ–ˆβ–Š | 88/310 [38:58<1:48:29, 29.32s/it] 29%|β–ˆβ–ˆβ–Š | 89/310 [39:17<1:36:52, 26.30s/it] 29%|β–ˆβ–ˆβ–Š | 89/310 [39:17<1:36:52, 26.30s/it] 29%|β–ˆβ–ˆβ–‰ | 90/310 [39:43<1:36:08, 26.22s/it] 29%|β–ˆβ–ˆβ–‰ | 90/310 [39:43<1:36:08, 26.22s/it] 29%|β–ˆβ–ˆβ–‰ | 91/310 [40:18<1:45:04, 28.79s/it] 29%|β–ˆβ–ˆβ–‰ | 91/310 [40:18<1:45:04, 28.79s/it] 30%|β–ˆβ–ˆβ–‰ | 92/310 [40:41<1:38:43, 27.17s/it] 30%|β–ˆβ–ˆβ–‰ | 92/310 [40:41<1:38:43, 27.17s/it] 30%|β–ˆβ–ˆβ–ˆ | 93/310 [41:00<1:29:11, 24.66s/it] 30%|β–ˆβ–ˆβ–ˆ | 93/310 [41:00<1:29:11, 24.66s/it] 30%|β–ˆβ–ˆβ–ˆ | 94/310 [41:26<1:29:49, 24.95s/it] 30%|β–ˆβ–ˆβ–ˆ | 94/310 [41:26<1:29:49, 24.95s/it] 31%|β–ˆβ–ˆβ–ˆ | 95/310 [41:47<1:25:14, 23.79s/it] 31%|β–ˆβ–ˆβ–ˆ | 95/310 [41:47<1:25:14, 23.79s/it] 31%|β–ˆβ–ˆβ–ˆ | 96/310 [42:18<1:32:16, 25.87s/it] 31%|β–ˆβ–ˆβ–ˆ | 96/310 [42:18<1:32:16, 25.87s/it] 31%|β–ˆβ–ˆβ–ˆβ– | 97/310 [42:48<1:36:55, 27.30s/it] 31%|β–ˆβ–ˆβ–ˆβ– | 97/310 [42:48<1:36:55, 27.30s/it] 32%|β–ˆβ–ˆβ–ˆβ– | 98/310 [43:18<1:38:39, 27.92s/it] 32%|β–ˆβ–ˆβ–ˆβ– | 98/310 [43:18<1:38:39, 27.92s/it] 32%|β–ˆβ–ˆβ–ˆβ– | 99/310 [43:47<1:39:13, 28.22s/it] 32%|β–ˆβ–ˆβ–ˆβ– | 99/310 [43:47<1:39:13, 28.22s/it] 32%|β–ˆβ–ˆβ–ˆβ– | 100/310 [44:17<1:41:32, 29.01s/it] 32%|β–ˆβ–ˆβ–ˆβ– | 100/310 [44:17<1:41:32, 29.01s/it]/home/aiops/duanky/miniconda3/envs/hiddenlanguage/lib/python3.11/site-packages/torch/utils/checkpoint.py:460: UserWarning: torch.utils.checkpoint: please pass in use_reentrant=True or use_reentrant=False explicitly. The default value of use_reentrant will be updated to be False in the future. To maintain current behavior, pass use_reentrant=True. It is recommended that you use use_reentrant=False. Refer to docs for more details on the differences between the two variants.
warnings.warn(
33%|β–ˆβ–ˆβ–ˆβ–Ž | 101/310 [44:50<1:44:40, 30.05s/it] 33%|β–ˆβ–ˆβ–ˆβ–Ž | 101/310 [44:50<1:44:40, 30.05s/it] 33%|β–ˆβ–ˆβ–ˆβ–Ž | 102/310 [45:15<1:39:13, 28.62s/it] 33%|β–ˆβ–ˆβ–ˆβ–Ž | 102/310 [45:15<1:39:13, 28.62s/it] 33%|β–ˆβ–ˆβ–ˆβ–Ž | 103/310 [45:40<1:34:40, 27.44s/it] 33%|β–ˆβ–ˆβ–ˆβ–Ž | 103/310 [45:40<1:34:40, 27.44s/it] 34%|β–ˆβ–ˆβ–ˆβ–Ž | 104/310 [46:11<1:37:43, 28.46s/it] 34%|β–ˆβ–ˆβ–ˆβ–Ž | 104/310 [46:11<1:37:43, 28.46s/it] 34%|β–ˆβ–ˆβ–ˆβ– | 105/310 [46:31<1:28:43, 25.97s/it] 34%|β–ˆβ–ˆβ–ˆβ– | 105/310 [46:31<1:28:43, 25.97s/it] 34%|β–ˆβ–ˆβ–ˆβ– | 106/310 [46:55<1:26:17, 25.38s/it] 34%|β–ˆβ–ˆβ–ˆβ– | 106/310 [46:55<1:26:17, 25.38s/it] 35%|β–ˆβ–ˆβ–ˆβ– | 107/310 [47:16<1:21:52, 24.20s/it] 35%|β–ˆβ–ˆβ–ˆβ– | 107/310 [47:16<1:21:52, 24.20s/it] 35%|β–ˆβ–ˆβ–ˆβ– | 108/310 [47:45<1:26:03, 25.56s/it] 35%|β–ˆβ–ˆβ–ˆβ– | 108/310 [47:45<1:26:03, 25.56s/it] 35%|β–ˆβ–ˆβ–ˆβ–Œ | 109/310 [48:09<1:23:40, 24.98s/it] 35%|β–ˆβ–ˆβ–ˆβ–Œ | 109/310 [48:09<1:23:40, 24.98s/it] 35%|β–ˆβ–ˆβ–ˆβ–Œ | 110/310 [48:39<1:28:29, 26.55s/it] 35%|β–ˆβ–ˆβ–ˆβ–Œ | 110/310 [48:39<1:28:29, 26.55s/it] 36%|β–ˆβ–ˆβ–ˆβ–Œ | 111/310 [48:59<1:21:31, 24.58s/it] 36%|β–ˆβ–ˆβ–ˆβ–Œ | 111/310 [48:59<1:21:31, 24.58s/it] 36%|β–ˆβ–ˆβ–ˆβ–Œ | 112/310 [49:21<1:18:53, 23.91s/it] 36%|β–ˆβ–ˆβ–ˆβ–Œ | 112/310 [49:21<1:18:53, 23.91s/it] 36%|β–ˆβ–ˆβ–ˆβ–‹ | 113/310 [49:41<1:14:53, 22.81s/it] 36%|β–ˆβ–ˆβ–ˆβ–‹ | 113/310 [49:41<1:14:53, 22.81s/it] 37%|β–ˆβ–ˆβ–ˆβ–‹ | 114/310 [50:04<1:14:38, 22.85s/it] 37%|β–ˆβ–ˆβ–ˆβ–‹ | 114/310 [50:04<1:14:38, 22.85s/it] 37%|β–ˆβ–ˆβ–ˆβ–‹ | 115/310 [50:30<1:16:49, 23.64s/it] 37%|β–ˆβ–ˆβ–ˆβ–‹ | 115/310 [50:30<1:16:49, 23.64s/it] 37%|β–ˆβ–ˆβ–ˆβ–‹ | 116/310 [50:59<1:21:57, 25.35s/it] 37%|β–ˆβ–ˆβ–ˆβ–‹ | 116/310 [50:59<1:21:57, 25.35s/it] 38%|β–ˆβ–ˆβ–ˆβ–Š | 117/310 [51:23<1:19:55, 24.84s/it] 38%|β–ˆβ–ˆβ–ˆβ–Š | 117/310 [51:23<1:19:55, 24.84s/it] 38%|β–ˆβ–ˆβ–ˆβ–Š | 118/310 [51:51<1:22:55, 25.91s/it] 38%|β–ˆβ–ˆβ–ˆβ–Š | 118/310 [51:51<1:22:55, 25.91s/it] 38%|β–ˆβ–ˆβ–ˆβ–Š | 119/310 [52:18<1:22:49, 26.02s/it] 38%|β–ˆβ–ˆβ–ˆβ–Š | 119/310 [52:18<1:22:49, 26.02s/it] 39%|β–ˆβ–ˆβ–ˆβ–Š | 120/310 [53:01<1:39:19, 31.37s/it] 39%|β–ˆβ–ˆβ–ˆβ–Š | 120/310 [53:01<1:39:19, 31.37s/it] 39%|β–ˆβ–ˆβ–ˆβ–‰ | 121/310 [53:26<1:32:35, 29.39s/it] 39%|β–ˆβ–ˆβ–ˆβ–‰ | 121/310 [53:26<1:32:35, 29.39s/it] 39%|β–ˆβ–ˆβ–ˆβ–‰ | 122/310 [54:00<1:36:03, 30.66s/it] 39%|β–ˆβ–ˆβ–ˆβ–‰ | 122/310 [54:00<1:36:03, 30.66s/it] 40%|β–ˆβ–ˆβ–ˆβ–‰ | 123/310 [54:19<1:24:34, 27.14s/it] 40%|β–ˆβ–ˆβ–ˆβ–‰ | 123/310 [54:19<1:24:34, 27.14s/it] 40%|β–ˆβ–ˆβ–ˆβ–ˆ | 124/310 [54:36<1:14:54, 24.16s/it] 40%|β–ˆβ–ˆβ–ˆβ–ˆ | 124/310 [54:36<1:14:54, 24.16s/it] 40%|β–ˆβ–ˆβ–ˆβ–ˆ | 125/310 [54:57<1:11:41, 23.25s/it] 40%|β–ˆβ–ˆβ–ˆβ–ˆ | 125/310 [54:57<1:11:41, 23.25s/it] 41%|β–ˆβ–ˆβ–ˆβ–ˆ | 126/310 [55:24<1:15:05, 24.49s/it] 41%|β–ˆβ–ˆβ–ˆβ–ˆ | 126/310 [55:24<1:15:05, 24.49s/it] 41%|β–ˆβ–ˆβ–ˆβ–ˆ | 127/310 [55:52<1:17:04, 25.27s/it] 41%|β–ˆβ–ˆβ–ˆβ–ˆ | 127/310 [55:52<1:17:04, 25.27s/it] 41%|β–ˆβ–ˆβ–ˆβ–ˆβ– | 128/310 [56:23<1:22:32, 27.21s/it] 41%|β–ˆβ–ˆβ–ˆβ–ˆβ– | 128/310 [56:23<1:22:32, 27.21s/it] 42%|β–ˆβ–ˆβ–ˆβ–ˆβ– | 129/310 [56:47<1:18:43, 26.09s/it] 42%|β–ˆβ–ˆβ–ˆβ–ˆβ– | 129/310 [56:47<1:18:43, 26.09s/it] 42%|β–ˆβ–ˆβ–ˆβ–ˆβ– | 130/310 [57:11<1:16:47, 25.60s/it] 42%|β–ˆβ–ˆβ–ˆβ–ˆβ– | 130/310 [57:11<1:16:47, 25.60s/it] 42%|β–ˆβ–ˆβ–ˆβ–ˆβ– | 131/310 [57:33<1:12:49, 24.41s/it] 42%|β–ˆβ–ˆβ–ˆβ–ˆβ– | 131/310 [57:33<1:12:49, 24.41s/it] 43%|β–ˆβ–ˆβ–ˆβ–ˆβ–Ž | 132/310 [57:59<1:13:49, 24.89s/it] 43%|β–ˆβ–ˆβ–ˆβ–ˆβ–Ž | 132/310 [57:59<1:13:49, 24.89s/it] 43%|β–ˆβ–ˆβ–ˆβ–ˆβ–Ž | 133/310 [58:26<1:15:38, 25.64s/it] 43%|β–ˆβ–ˆβ–ˆβ–ˆβ–Ž | 133/310 [58:26<1:15:38, 25.64s/it] 43%|β–ˆβ–ˆβ–ˆβ–ˆβ–Ž | 134/310 [58:52<1:15:18, 25.67s/it] 43%|β–ˆβ–ˆβ–ˆβ–ˆβ–Ž | 134/310 [58:52<1:15:18, 25.67s/it] 44%|β–ˆβ–ˆβ–ˆβ–ˆβ–Ž | 135/310 [59:19<1:16:25, 26.20s/it] 44%|β–ˆβ–ˆβ–ˆβ–ˆβ–Ž | 135/310 [59:19<1:16:25, 26.20s/it] 44%|β–ˆβ–ˆβ–ˆβ–ˆβ– | 136/310 [59:42<1:12:58, 25.16s/it] 44%|β–ˆβ–ˆβ–ˆβ–ˆβ– | 136/310 [59:42<1:12:58, 25.16s/it] 44%|β–ˆβ–ˆβ–ˆβ–ˆβ– | 137/310 [1:00:19<1:22:52, 28.74s/it] 44%|β–ˆβ–ˆβ–ˆβ–ˆβ– | 137/310 [1:00:19<1:22:52, 28.74s/it] 45%|β–ˆβ–ˆβ–ˆβ–ˆβ– | 138/310 [1:00:39<1:14:15, 25.91s/it] 45%|β–ˆβ–ˆβ–ˆβ–ˆβ– | 138/310 [1:00:39<1:14:15, 25.91s/it] 45%|β–ˆβ–ˆβ–ˆβ–ˆβ– | 139/310 [1:01:08<1:17:02, 27.03s/it] 45%|β–ˆβ–ˆβ–ˆβ–ˆβ– | 139/310 [1:01:08<1:17:02, 27.03s/it] 45%|β–ˆβ–ˆβ–ˆβ–ˆβ–Œ | 140/310 [1:01:35<1:16:44, 27.09s/it] 45%|β–ˆβ–ˆβ–ˆβ–ˆβ–Œ | 140/310 [1:01:35<1:16:44, 27.09s/it] 45%|β–ˆβ–ˆβ–ˆβ–ˆβ–Œ | 141/310 [1:02:11<1:23:23, 29.61s/it] 45%|β–ˆβ–ˆβ–ˆβ–ˆβ–Œ | 141/310 [1:02:11<1:23:23, 29.61s/it] 46%|β–ˆβ–ˆβ–ˆβ–ˆβ–Œ | 142/310 [1:02:35<1:17:53, 27.82s/it] 46%|β–ˆβ–ˆβ–ˆβ–ˆβ–Œ | 142/310 [1:02:35<1:17:53, 27.82s/it] 46%|β–ˆβ–ˆβ–ˆβ–ˆβ–Œ | 143/310 [1:02:54<1:10:02, 25.17s/it] 46%|β–ˆβ–ˆβ–ˆβ–ˆβ–Œ | 143/310 [1:02:54<1:10:02, 25.17s/it] 46%|β–ˆβ–ˆβ–ˆβ–ˆβ–‹ | 144/310 [1:03:15<1:06:20, 23.98s/it] 46%|β–ˆβ–ˆβ–ˆβ–ˆβ–‹ | 144/310 [1:03:15<1:06:20, 23.98s/it] 47%|β–ˆβ–ˆβ–ˆβ–ˆβ–‹ | 145/310 [1:03:42<1:08:29, 24.91s/it] 47%|β–ˆβ–ˆβ–ˆβ–ˆβ–‹ | 145/310 [1:03:42<1:08:29, 24.91s/it] 47%|β–ˆβ–ˆβ–ˆβ–ˆβ–‹ | 146/310 [1:04:05<1:06:23, 24.29s/it] 47%|β–ˆβ–ˆβ–ˆβ–ˆβ–‹ | 146/310 [1:04:05<1:06:23, 24.29s/it] 47%|β–ˆβ–ˆβ–ˆβ–ˆβ–‹ | 147/310 [1:04:35<1:10:49, 26.07s/it] 47%|β–ˆβ–ˆβ–ˆβ–ˆβ–‹ | 147/310 [1:04:35<1:10:49, 26.07s/it] 48%|β–ˆβ–ˆβ–ˆβ–ˆβ–Š | 148/310 [1:05:06<1:14:47, 27.70s/it] 48%|β–ˆβ–ˆβ–ˆβ–ˆβ–Š | 148/310 [1:05:06<1:14:47, 27.70s/it] 48%|β–ˆβ–ˆβ–ˆβ–ˆβ–Š | 149/310 [1:05:28<1:09:33, 25.92s/it] 48%|β–ˆβ–ˆβ–ˆβ–ˆβ–Š | 149/310 [1:05:28<1:09:33, 25.92s/it] 48%|β–ˆβ–ˆβ–ˆβ–ˆβ–Š | 150/310 [1:05:49<1:04:53, 24.33s/it] 48%|β–ˆβ–ˆβ–ˆβ–ˆβ–Š | 150/310 [1:05:49<1:04:53, 24.33s/it] 49%|β–ˆβ–ˆβ–ˆβ–ˆβ–Š | 151/310 [1:06:22<1:11:09, 26.85s/it] 49%|β–ˆβ–ˆβ–ˆβ–ˆβ–Š | 151/310 [1:06:22<1:11:09, 26.85s/it] 49%|β–ˆβ–ˆβ–ˆβ–ˆβ–‰ | 152/310 [1:06:45<1:08:21, 25.96s/it] 49%|β–ˆβ–ˆβ–ˆβ–ˆβ–‰ | 152/310 [1:06:45<1:08:21, 25.96s/it] 49%|β–ˆβ–ˆβ–ˆβ–ˆβ–‰ | 153/310 [1:07:10<1:07:07, 25.65s/it] 49%|β–ˆβ–ˆβ–ˆβ–ˆβ–‰ | 153/310 [1:07:10<1:07:07, 25.65s/it] 50%|β–ˆβ–ˆβ–ˆβ–ˆβ–‰ | 154/310 [1:07:36<1:06:56, 25.75s/it] 50%|β–ˆβ–ˆβ–ˆβ–ˆβ–‰ | 154/310 [1:07:36<1:06:56, 25.75s/it] 50%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆ | 155/310 [1:07:56<1:01:56, 23.98s/it] 50%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆ | 155/310 [1:07:56<1:01:56, 23.98s/it] 50%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆ | 156/310 [1:08:25<1:04:57, 25.31s/it] 50%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆ | 156/310 [1:08:25<1:04:57, 25.31s/it] 51%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆ | 157/310 [1:08:46<1:01:36, 24.16s/it] 51%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆ | 157/310 [1:08:46<1:01:36, 24.16s/it] 51%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆ | 158/310 [1:09:12<1:02:44, 24.77s/it] 51%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆ | 158/310 [1:09:12<1:02:44, 24.77s/it] 51%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ– | 159/310 [1:09:39<1:04:02, 25.45s/it] 51%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ– | 159/310 [1:09:39<1:04:02, 25.45s/it] 52%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ– | 160/310 [1:10:09<1:07:01, 26.81s/it] 52%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ– | 160/310 [1:10:09<1:07:01, 26.81s/it] 52%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ– | 161/310 [1:10:34<1:04:49, 26.10s/it] 52%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ– | 161/310 [1:10:34<1:04:49, 26.10s/it] 52%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ– | 162/310 [1:10:58<1:03:18, 25.67s/it] 52%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ– | 162/310 [1:10:58<1:03:18, 25.67s/it] 53%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Ž | 163/310 [1:11:29<1:06:49, 27.27s/it] 53%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Ž | 163/310 [1:11:29<1:06:49, 27.27s/it] 53%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Ž | 164/310 [1:12:02<1:10:16, 28.88s/it] {'loss': 1.2457, 'grad_norm': 1.5419526100158691, 'learning_rate': 0.00034, 'epoch': 2.72}
{'loss': 1.3599, 'grad_norm': 1.755259394645691, 'learning_rate': 0.000344, 'epoch': 2.75}
{'loss': 1.3254, 'grad_norm': 1.613865613937378, 'learning_rate': 0.000348, 'epoch': 2.78}
{'loss': 1.4252, 'grad_norm': 1.7179518938064575, 'learning_rate': 0.00035200000000000005, 'epoch': 2.82}
{'loss': 1.2615, 'grad_norm': 2.2753326892852783, 'learning_rate': 0.00035600000000000003, 'epoch': 2.85}
{'loss': 1.241, 'grad_norm': 1.842179536819458, 'learning_rate': 0.00036, 'epoch': 2.88}
{'loss': 1.2639, 'grad_norm': 1.5134506225585938, 'learning_rate': 0.000364, 'epoch': 2.91}
{'loss': 1.3813, 'grad_norm': 1.7905986309051514, 'learning_rate': 0.00036800000000000005, 'epoch': 2.94}
{'loss': 1.0662, 'grad_norm': 1.8773596286773682, 'learning_rate': 0.00037200000000000004, 'epoch': 2.98}
{'loss': 1.0737, 'grad_norm': 1.5927729606628418, 'learning_rate': 0.000376, 'epoch': 3.01}
{'loss': 0.8692, 'grad_norm': 2.322094678878784, 'learning_rate': 0.00038, 'epoch': 3.04}
{'loss': 0.8562, 'grad_norm': 2.568728446960449, 'learning_rate': 0.000384, 'epoch': 3.07}
{'loss': 0.8724, 'grad_norm': 3.8487601280212402, 'learning_rate': 0.000388, 'epoch': 3.1}
{'loss': 0.9551, 'grad_norm': 4.1459059715271, 'learning_rate': 0.000392, 'epoch': 3.14}
{'loss': 0.889, 'grad_norm': 2.5314478874206543, 'learning_rate': 0.00039600000000000003, 'epoch': 3.17}
{'loss': 0.8369, 'grad_norm': 2.2340171337127686, 'learning_rate': 0.0004, 'epoch': 3.2}
{'loss': 0.9581, 'grad_norm': 2.0403265953063965, 'learning_rate': 0.00039997762036205473, 'epoch': 3.23}
{'loss': 0.8314, 'grad_norm': 1.9364635944366455, 'learning_rate': 0.00039991048645670067, 'epoch': 3.26}
{'loss': 0.7546, 'grad_norm': 2.1579976081848145, 'learning_rate': 0.00039979861330826294, 'epoch': 3.3}
{'loss': 0.9047, 'grad_norm': 2.45918345451355, 'learning_rate': 0.000399642025953547, 'epoch': 3.33}
{'loss': 0.8436, 'grad_norm': 3.3697502613067627, 'learning_rate': 0.00039944075943623605, 'epoch': 3.36}
{'loss': 0.8106, 'grad_norm': 2.423370838165283, 'learning_rate': 0.00039919485879904784, 'epoch': 3.39}
{'loss': 0.8619, 'grad_norm': 2.6539015769958496, 'learning_rate': 0.0003989043790736547, 'epoch': 3.42}
{'loss': 0.9622, 'grad_norm': 2.5989012718200684, 'learning_rate': 0.0003985693852683675, 'epoch': 3.46}
{'loss': 0.9281, 'grad_norm': 2.486663579940796, 'learning_rate': 0.00039818995235358696, 'epoch': 3.49}
{'loss': 0.9238, 'grad_norm': 2.2259626388549805, 'learning_rate': 0.0003977661652450257, 'epoch': 3.52}
{'loss': 0.8284, 'grad_norm': 2.4912493228912354, 'learning_rate': 0.00039729811878470427, 'epoch': 3.55}
{'loss': 0.8886, 'grad_norm': 3.591897964477539, 'learning_rate': 0.0003967859177197259, 'epoch': 3.58}
{'loss': 0.8223, 'grad_norm': 2.817984104156494, 'learning_rate': 0.00039622967667883455, 'epoch': 3.62}
{'loss': 0.8662, 'grad_norm': 2.7852816581726074, 'learning_rate': 0.00039562952014676116, 'epoch': 3.65}
{'loss': 0.9307, 'grad_norm': 2.7522706985473633, 'learning_rate': 0.0003949855824363647, 'epoch': 3.68}
{'loss': 0.9421, 'grad_norm': 2.4207229614257812, 'learning_rate': 0.0003942980076585735, 'epoch': 3.71}
{'loss': 0.7998, 'grad_norm': 2.437062978744507, 'learning_rate': 0.00039356694969013337, 'epoch': 3.74}
{'loss': 0.9184, 'grad_norm': 2.865618944168091, 'learning_rate': 0.00039279257213917066, 'epoch': 3.78}
{'loss': 0.9429, 'grad_norm': 2.6722922325134277, 'learning_rate': 0.0003919750483085778, 'epoch': 3.81}
{'loss': 1.1087, 'grad_norm': 1.9785279035568237, 'learning_rate': 0.0003911145611572282, 'epoch': 3.84}
{'loss': 0.9689, 'grad_norm': 2.205739736557007, 'learning_rate': 0.00039021130325903074, 'epoch': 3.87}
{'loss': 1.0234, 'grad_norm': 2.2968075275421143, 'learning_rate': 0.00038926547675983286, 'epoch': 3.9}
{'loss': 0.9101, 'grad_norm': 2.625220775604248, 'learning_rate': 0.00038827729333218067, 'epoch': 3.94}
{'loss': 0.8091, 'grad_norm': 3.1852171421051025, 'learning_rate': 0.00038724697412794747, 'epoch': 3.97}
{'loss': 0.877, 'grad_norm': 2.5322062969207764, 'learning_rate': 0.0003861747497288409, 'epoch': 4.0}
{'loss': 0.5778, 'grad_norm': 2.009476900100708, 'learning_rate': 0.00038506086009479937, 'epoch': 4.03}
{'loss': 0.5761, 'grad_norm': 2.336458683013916, 'learning_rate': 0.0003839055545102902, 'epoch': 4.06}
{'loss': 0.7251, 'grad_norm': 3.4863994121551514, 'learning_rate': 0.0003827090915285202, 'epoch': 4.1}
{'loss': 0.5691, 'grad_norm': 3.6318466663360596, 'learning_rate': 0.000381471738913573, 'epoch': 4.13}
{'loss': 0.5109, 'grad_norm': 3.195667028427124, 'learning_rate': 0.0003801937735804838, 'epoch': 4.16}
{'loss': 0.5271, 'grad_norm': 2.632443428039551, 'learning_rate': 0.0003788754815332674, 'epoch': 4.19}
{'loss': 0.5436, 'grad_norm': 2.512230396270752, 'learning_rate': 0.00037751715780091086, 'epoch': 4.22}
{'loss': 0.6281, 'grad_norm': 2.6098170280456543, 'learning_rate': 0.0003761191063713476, 'epoch': 4.26}
{'loss': 0.5592, 'grad_norm': 2.866654872894287, 'learning_rate': 0.00037468164012342597, 'epoch': 4.29}
{'loss': 0.6101, 'grad_norm': 2.6089141368865967, 'learning_rate': 0.00037320508075688776, 'epoch': 4.32}
{'loss': 0.5595, 'grad_norm': 3.013399362564087, 'learning_rate': 0.00037168975872037323, 'epoch': 4.35}
{'loss': 0.7711, 'grad_norm': 2.2104837894439697, 'learning_rate': 0.00037013601313746797, 'epoch': 4.38}
{'loss': 0.5617, 'grad_norm': 2.6637637615203857, 'learning_rate': 0.0003685441917308078, 'epoch': 4.42}
{'loss': 0.7593, 'grad_norm': 2.549792528152466, 'learning_rate': 0.00036691465074426054, 'epoch': 4.45}
{'loss': 0.6126, 'grad_norm': 2.5195631980895996, 'learning_rate': 0.000365247754863199, 'epoch': 4.48}
{'loss': 0.8219, 'grad_norm': 2.139824390411377, 'learning_rate': 0.0003635438771328863, 'epoch': 4.51}
{'loss': 0.6161, 'grad_norm': 2.703467845916748, 'learning_rate': 0.0003618033988749895, 'epoch': 4.54}
{'loss': 0.6019, 'grad_norm': 2.891723871231079, 'learning_rate': 0.0003600267096022413, 'epoch': 4.58}
{'loss': 0.6058, 'grad_norm': 2.645942449569702, 'learning_rate': 0.0003582142069312683, 'epoch': 4.61}
{'loss': 0.6192, 'grad_norm': 2.4883062839508057, 'learning_rate': 0.000356366296493606, 'epoch': 4.64}
{'loss': 0.7085, 'grad_norm': 2.9119791984558105, 'learning_rate': 0.0003544833918449199, 'epoch': 4.67}
{'loss': 0.7673, 'grad_norm': 2.8716561794281006, 'learning_rate': 0.0003525659143724533, 'epoch': 4.7}
{'loss': 0.7615, 'grad_norm': 2.499471664428711, 'learning_rate': 0.00035061429320072223, 'epoch': 4.74}
{'loss': 0.7292, 'grad_norm': 3.0349347591400146, 'learning_rate': 0.00034862896509547887, 'epoch': 4.77}
{'loss': 0.5818, 'grad_norm': 2.8034589290618896, 'learning_rate': 0.0003466103743659653, 'epoch': 4.8}
{'loss': 0.8276, 'grad_norm': 2.9072346687316895, 'learning_rate': 0.0003445589727654783, 'epoch': 4.83}
{'loss': 0.6848, 'grad_norm': 2.4491543769836426, 'learning_rate': 0.000342475219390269, 'epoch': 4.86}
{'loss': 0.7028, 'grad_norm': 2.8916873931884766, 'learning_rate': 0.0003403595805767983, 'epoch': 4.9}
{'loss': 0.7743, 'grad_norm': 2.994807243347168, 'learning_rate': 0.00033821252979737297, 'epoch': 4.93}
{'loss': 0.6115, 'grad_norm': 3.1947031021118164, 'learning_rate': 0.0003360345475541839, 'epoch': 4.96}
{'loss': 0.6807, 'grad_norm': 2.523216485977173, 'learning_rate': 0.00033382612127177166, 'epoch': 4.99}
{'loss': 0.3461, 'grad_norm': 1.9927023649215698, 'learning_rate': 0.00033158774518794254, 'epoch': 5.02}
{'loss': 0.3726, 'grad_norm': 2.2113711833953857, 'learning_rate': 0.0003293199202431599, 'epoch': 5.06}
{'loss': 0.3981, 'grad_norm': 2.318103313446045, 'learning_rate': 0.0003270231539684358, 'epoch': 5.09}
{'loss': 0.4989, 'grad_norm': 2.4667224884033203, 'learning_rate': 0.00032469796037174674, 'epoch': 5.12}
{'loss': 0.394, 'grad_norm': 2.646738052368164, 'learning_rate': 0.0003223448598230013, 'epoch': 5.15}
{'loss': 0.4174, 'grad_norm': 3.2133796215057373, 'learning_rate': 0.00031996437893758276, 'epoch': 5.18}
{'loss': 0.5128, 'grad_norm': 3.122089147567749, 'learning_rate': 0.00031755705045849464, 'epoch': 5.22}
53%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Ž | 164/310 [1:12:02<1:10:16, 28.88s/it] 53%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Ž | 165/310 [1:12:25<1:05:44, 27.20s/it] 53%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Ž | 165/310 [1:12:25<1:05:44, 27.20s/it] 54%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Ž | 166/310 [1:12:58<1:09:19, 28.89s/it] 54%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Ž | 166/310 [1:12:58<1:09:19, 28.89s/it] 54%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ– | 167/310 [1:13:22<1:04:57, 27.25s/it] 54%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ– | 167/310 [1:13:22<1:04:57, 27.25s/it] 54%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ– | 168/310 [1:13:45<1:01:50, 26.13s/it] 54%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ– | 168/310 [1:13:45<1:01:50, 26.13s/it] 55%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ– | 169/310 [1:14:19<1:06:47, 28.42s/it] 55%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ– | 169/310 [1:14:19<1:06:47, 28.42s/it] 55%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ– | 170/310 [1:14:45<1:04:34, 27.67s/it] 55%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ– | 170/310 [1:14:45<1:04:34, 27.67s/it] 55%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Œ | 171/310 [1:15:11<1:02:50, 27.12s/it] 55%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Œ | 171/310 [1:15:11<1:02:50, 27.12s/it] 55%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Œ | 172/310 [1:15:46<1:08:12, 29.66s/it] 55%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Œ | 172/310 [1:15:46<1:08:12, 29.66s/it] 56%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Œ | 173/310 [1:16:05<1:00:23, 26.45s/it] 56%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Œ | 173/310 [1:16:05<1:00:23, 26.45s/it] 56%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Œ | 174/310 [1:16:28<57:35, 25.40s/it] 56%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Œ | 174/310 [1:16:28<57:35, 25.40s/it] 56%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‹ | 175/310 [1:16:55<58:10, 25.85s/it] 56%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‹ | 175/310 [1:16:55<58:10, 25.85s/it] 57%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‹ | 176/310 [1:17:19<56:41, 25.38s/it] 57%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‹ | 176/310 [1:17:19<56:41, 25.38s/it] 57%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‹ | 177/310 [1:17:44<55:36, 25.09s/it] 57%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‹ | 177/310 [1:17:44<55:36, 25.09s/it] 57%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‹ | 178/310 [1:18:10<55:51, 25.39s/it] 57%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‹ | 178/310 [1:18:10<55:51, 25.39s/it] 58%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Š | 179/310 [1:18:29<51:38, 23.65s/it] 58%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Š | 179/310 [1:18:29<51:38, 23.65s/it] 58%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Š | 180/310 [1:18:51<50:09, 23.15s/it] 58%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Š | 180/310 [1:18:51<50:09, 23.15s/it] 58%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Š | 181/310 [1:19:18<51:41, 24.04s/it] 58%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Š | 181/310 [1:19:18<51:41, 24.04s/it] 59%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Š | 182/310 [1:19:47<55:00, 25.78s/it] 59%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Š | 182/310 [1:19:47<55:00, 25.78s/it] 59%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‰ | 183/310 [1:20:12<53:40, 25.36s/it] 59%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‰ | 183/310 [1:20:12<53:40, 25.36s/it] 59%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‰ | 184/310 [1:20:40<54:49, 26.11s/it] 59%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‰ | 184/310 [1:20:40<54:49, 26.11s/it] 60%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‰ | 185/310 [1:21:04<53:08, 25.51s/it] 60%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‰ | 185/310 [1:21:04<53:08, 25.51s/it] 60%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆ | 186/310 [1:21:30<53:13, 25.75s/it] 60%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆ | 186/310 [1:21:30<53:13, 25.75s/it] 60%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆ | 187/310 [1:22:00<55:17, 26.97s/it] 60%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆ | 187/310 [1:22:00<55:17, 26.97s/it] 61%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆ | 188/310 [1:22:27<54:58, 27.04s/it] 61%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆ | 188/310 [1:22:27<54:58, 27.04s/it] 61%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆ | 189/310 [1:22:53<53:47, 26.67s/it] 61%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆ | 189/310 [1:22:53<53:47, 26.67s/it] 61%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ– | 190/310 [1:23:24<55:47, 27.89s/it] 61%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ– | 190/310 [1:23:24<55:47, 27.89s/it] 62%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ– | 191/310 [1:23:47<52:25, 26.43s/it] 62%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ– | 191/310 [1:23:47<52:25, 26.43s/it] 62%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ– | 192/310 [1:24:13<52:10, 26.53s/it] 62%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ– | 192/310 [1:24:13<52:10, 26.53s/it] 62%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ– | 193/310 [1:24:39<51:25, 26.37s/it] 62%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ– | 193/310 [1:24:39<51:25, 26.37s/it] 63%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Ž | 194/310 [1:25:03<49:07, 25.41s/it] 63%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Ž | 194/310 [1:25:03<49:07, 25.41s/it] 63%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Ž | 195/310 [1:25:23<45:39, 23.82s/it] 63%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Ž | 195/310 [1:25:23<45:39, 23.82s/it] 63%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Ž | 196/310 [1:25:56<50:40, 26.67s/it] 63%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Ž | 196/310 [1:25:56<50:40, 26.67s/it] 64%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Ž | 197/310 [1:26:22<49:55, 26.51s/it] 64%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Ž | 197/310 [1:26:22<49:55, 26.51s/it] 64%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ– | 198/310 [1:26:54<52:39, 28.21s/it] 64%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ– | 198/310 [1:26:54<52:39, 28.21s/it] 64%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ– | 199/310 [1:27:22<51:38, 27.91s/it] 64%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ– | 199/310 [1:27:22<51:38, 27.91s/it] 65%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ– | 200/310 [1:27:48<50:19, 27.45s/it] 65%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ– | 200/310 [1:27:48<50:19, 27.45s/it]/home/aiops/duanky/miniconda3/envs/hiddenlanguage/lib/python3.11/site-packages/torch/utils/checkpoint.py:460: UserWarning: torch.utils.checkpoint: please pass in use_reentrant=True or use_reentrant=False explicitly. The default value of use_reentrant will be updated to be False in the future. To maintain current behavior, pass use_reentrant=True. It is recommended that you use use_reentrant=False. Refer to docs for more details on the differences between the two variants.
warnings.warn(
65%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ– | 201/310 [1:28:20<52:18, 28.80s/it] 65%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ– | 201/310 [1:28:20<52:18, 28.80s/it] 65%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Œ | 202/310 [1:28:54<54:55, 30.51s/it] 65%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Œ | 202/310 [1:28:54<54:55, 30.51s/it] 65%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Œ | 203/310 [1:29:19<51:27, 28.86s/it] 65%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Œ | 203/310 [1:29:19<51:27, 28.86s/it] 66%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Œ | 204/310 [1:29:44<48:51, 27.66s/it] 66%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Œ | 204/310 [1:29:44<48:51, 27.66s/it] 66%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Œ | 205/310 [1:30:03<43:52, 25.07s/it] 66%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Œ | 205/310 [1:30:03<43:52, 25.07s/it] 66%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‹ | 206/310 [1:30:33<46:02, 26.56s/it] 66%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‹ | 206/310 [1:30:33<46:02, 26.56s/it] 67%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‹ | 207/310 [1:30:54<42:40, 24.86s/it] 67%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‹ | 207/310 [1:30:54<42:40, 24.86s/it] 67%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‹ | 208/310 [1:31:19<42:27, 24.98s/it] 67%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‹ | 208/310 [1:31:19<42:27, 24.98s/it] 67%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‹ | 209/310 [1:31:42<40:36, 24.12s/it] 67%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‹ | 209/310 [1:31:42<40:36, 24.12s/it] 68%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Š | 210/310 [1:32:04<39:18, 23.58s/it] 68%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Š | 210/310 [1:32:04<39:18, 23.58s/it] 68%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Š | 211/310 [1:32:33<41:53, 25.39s/it] 68%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Š | 211/310 [1:32:33<41:53, 25.39s/it] 68%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Š | 212/310 [1:32:55<39:35, 24.24s/it] 68%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Š | 212/310 [1:32:55<39:35, 24.24s/it] 69%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Š | 213/310 [1:33:23<41:10, 25.47s/it] 69%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Š | 213/310 [1:33:23<41:10, 25.47s/it] 69%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‰ | 214/310 [1:33:44<38:38, 24.15s/it] 69%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‰ | 214/310 [1:33:44<38:38, 24.15s/it] 69%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‰ | 215/310 [1:34:15<41:21, 26.12s/it] 69%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‰ | 215/310 [1:34:15<41:21, 26.12s/it] 70%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‰ | 216/310 [1:34:44<42:21, 27.04s/it] 70%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‰ | 216/310 [1:34:44<42:21, 27.04s/it] 70%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆ | 217/310 [1:35:19<45:14, 29.19s/it] 70%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆ | 217/310 [1:35:19<45:14, 29.19s/it] 70%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆ | 218/310 [1:35:43<42:47, 27.91s/it] 70%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆ | 218/310 [1:35:43<42:47, 27.91s/it] 71%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆ | 219/310 [1:36:14<43:24, 28.62s/it] 71%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆ | 219/310 [1:36:14<43:24, 28.62s/it] 71%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆ | 220/310 [1:36:41<42:23, 28.26s/it] 71%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆ | 220/310 [1:36:41<42:23, 28.26s/it] 71%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ– | 221/310 [1:37:07<40:41, 27.43s/it] 71%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ– | 221/310 [1:37:07<40:41, 27.43s/it] 72%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ– | 222/310 [1:37:24<35:37, 24.29s/it] 72%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ– | 222/310 [1:37:24<35:37, 24.29s/it] 72%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ– | 223/310 [1:37:52<37:06, 25.59s/it] 72%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ– | 223/310 [1:37:52<37:06, 25.59s/it] 72%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ– | 224/310 [1:38:17<36:06, 25.19s/it] 72%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ– | 224/310 [1:38:17<36:06, 25.19s/it] 73%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Ž | 225/310 [1:38:43<36:09, 25.52s/it] 73%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Ž | 225/310 [1:38:43<36:09, 25.52s/it] 73%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Ž | 226/310 [1:39:19<40:25, 28.87s/it] 73%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Ž | 226/310 [1:39:19<40:25, 28.87s/it] 73%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Ž | 227/310 [1:39:50<40:40, 29.41s/it] 73%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Ž | 227/310 [1:39:50<40:40, 29.41s/it] 74%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Ž | 228/310 [1:40:17<39:17, 28.75s/it] 74%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Ž | 228/310 [1:40:17<39:17, 28.75s/it] 74%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ– | 229/310 [1:40:40<36:09, 26.78s/it] 74%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ– | 229/310 [1:40:40<36:09, 26.78s/it] 74%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ– | 230/310 [1:41:07<36:02, 27.03s/it] 74%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ– | 230/310 [1:41:07<36:02, 27.03s/it] 75%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ– | 231/310 [1:41:46<40:23, 30.68s/it] 75%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ– | 231/310 [1:41:46<40:23, 30.68s/it] 75%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ– | 232/310 [1:42:13<38:25, 29.56s/it] 75%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ– | 232/310 [1:42:13<38:25, 29.56s/it] 75%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Œ | 233/310 [1:42:43<38:04, 29.67s/it] 75%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Œ | 233/310 [1:42:43<38:04, 29.67s/it] 75%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Œ | 234/310 [1:43:04<34:04, 26.90s/it] 75%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Œ | 234/310 [1:43:04<34:04, 26.90s/it] 76%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Œ | 235/310 [1:43:31<33:49, 27.06s/it] 76%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Œ | 235/310 [1:43:31<33:49, 27.06s/it] 76%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Œ | 236/310 [1:44:01<34:31, 27.99s/it] 76%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Œ | 236/310 [1:44:01<34:31, 27.99s/it] 76%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‹ | 237/310 [1:44:27<33:21, 27.42s/it] 76%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‹ | 237/310 [1:44:27<33:21, 27.42s/it] 77%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‹ | 238/310 [1:44:47<30:01, 25.02s/it] 77%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‹ | 238/310 [1:44:47<30:01, 25.02s/it] 77%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‹ | 239/310 [1:45:13<30:12, 25.53s/it] 77%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‹ | 239/310 [1:45:13<30:12, 25.53s/it] 77%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‹ | 240/310 [1:45:37<29:00, 24.87s/it] 77%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‹ | 240/310 [1:45:37<29:00, 24.87s/it] 78%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Š | 241/310 [1:46:01<28:26, 24.72s/it] 78%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Š | 241/310 [1:46:01<28:26, 24.72s/it] 78%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Š | 242/310 [1:46:29<29:02, 25.63s/it] {'loss': 0.5152, 'grad_norm': 3.3867886066436768, 'learning_rate': 0.00031512341313713474, 'epoch': 5.25}
{'loss': 0.3792, 'grad_norm': 2.431579351425171, 'learning_rate': 0.0003126640116127244, 'epoch': 5.28}
{'loss': 0.4987, 'grad_norm': 2.0915236473083496, 'learning_rate': 0.0003101793962904205, 'epoch': 5.31}
{'loss': 0.4169, 'grad_norm': 2.8675644397735596, 'learning_rate': 0.0003076701232181365, 'epoch': 5.34}
{'loss': 0.429, 'grad_norm': 2.8221242427825928, 'learning_rate': 0.00030513675396210094, 'epoch': 5.38}
{'loss': 0.5472, 'grad_norm': 2.428004264831543, 'learning_rate': 0.00030257985548118126, 'epoch': 5.41}
{'loss': 0.4324, 'grad_norm': 2.6007370948791504, 'learning_rate': 0.00030000000000000003, 'epoch': 5.44}
{'loss': 0.5068, 'grad_norm': 3.2577507495880127, 'learning_rate': 0.00029739776488087345, 'epoch': 5.47}
{'loss': 0.5063, 'grad_norm': 2.3559985160827637, 'learning_rate': 0.0002947737324945997, 'epoch': 5.5}
{'loss': 0.373, 'grad_norm': 2.5560073852539062, 'learning_rate': 0.0002921284900901265, 'epoch': 5.54}
{'loss': 0.4752, 'grad_norm': 2.8501780033111572, 'learning_rate': 0.00028946262966312653, 'epoch': 5.57}
{'loss': 0.473, 'grad_norm': 2.443272352218628, 'learning_rate': 0.00028677674782351165, 'epoch': 5.6}
{'loss': 0.4048, 'grad_norm': 3.037306070327759, 'learning_rate': 0.00028407144566191313, 'epoch': 5.63}
{'loss': 0.3663, 'grad_norm': 2.3580427169799805, 'learning_rate': 0.0002813473286151601, 'epoch': 5.66}
{'loss': 0.4945, 'grad_norm': 3.0282719135284424, 'learning_rate': 0.00027860500633078477, 'epoch': 5.7}
{'loss': 0.382, 'grad_norm': 2.7548229694366455, 'learning_rate': 0.0002758450925305857, 'epoch': 5.73}
{'loss': 0.417, 'grad_norm': 2.615589141845703, 'learning_rate': 0.00027306820487327905, 'epoch': 5.76}
{'loss': 0.4628, 'grad_norm': 3.054159641265869, 'learning_rate': 0.0002702749648162686, 'epoch': 5.79}
{'loss': 0.5489, 'grad_norm': 2.641526699066162, 'learning_rate': 0.00026746599747656607, 'epoch': 5.82}
{'loss': 0.4926, 'grad_norm': 2.968789577484131, 'learning_rate': 0.00026464193149089205, 'epoch': 5.86}
{'loss': 0.4561, 'grad_norm': 2.786257743835449, 'learning_rate': 0.00026180339887498953, 'epoch': 5.89}
{'loss': 0.4068, 'grad_norm': 2.4414594173431396, 'learning_rate': 0.00025895103488218085, 'epoch': 5.92}
{'loss': 0.5115, 'grad_norm': 2.454012870788574, 'learning_rate': 0.00025608547786120056, 'epoch': 5.95}
{'loss': 0.4792, 'grad_norm': 2.3952438831329346, 'learning_rate': 0.00025320736911333503, 'epoch': 5.98}
{'loss': 0.4068, 'grad_norm': 1.9571117162704468, 'learning_rate': 0.0002503173527489017, 'epoch': 6.02}
{'loss': 0.2905, 'grad_norm': 2.0324618816375732, 'learning_rate': 0.00024741607554309953, 'epoch': 6.05}
{'loss': 0.2439, 'grad_norm': 1.9604227542877197, 'learning_rate': 0.0002445041867912629, 'epoch': 6.08}
{'loss': 0.2561, 'grad_norm': 2.384906768798828, 'learning_rate': 0.00024158233816355185, 'epoch': 6.11}
{'loss': 0.1931, 'grad_norm': 2.3552908897399902, 'learning_rate': 0.00023865118355911066, 'epoch': 6.14}
{'loss': 0.212, 'grad_norm': 2.022010087966919, 'learning_rate': 0.00023571137895972733, 'epoch': 6.18}
{'loss': 0.2427, 'grad_norm': 2.2903988361358643, 'learning_rate': 0.00023276358228302755, 'epoch': 6.21}
{'loss': 0.2294, 'grad_norm': 3.257195234298706, 'learning_rate': 0.00022980845323523487, 'epoch': 6.24}
{'loss': 0.3048, 'grad_norm': 2.0269675254821777, 'learning_rate': 0.0002268466531635311, 'epoch': 6.27}
{'loss': 0.2448, 'grad_norm': 2.6209371089935303, 'learning_rate': 0.00022387884490804885, 'epoch': 6.3}
{'loss': 0.2997, 'grad_norm': 1.9519191980361938, 'learning_rate': 0.00022090569265353072, 'epoch': 6.34}
{'loss': 0.3085, 'grad_norm': 2.1031925678253174, 'learning_rate': 0.00021792786178068672, 'epoch': 6.37}
{'loss': 0.2777, 'grad_norm': 2.3735647201538086, 'learning_rate': 0.00021494601871728487, 'epoch': 6.4}
{'loss': 0.3242, 'grad_norm': 2.3519961833953857, 'learning_rate': 0.00021196083078900687, 'epoch': 6.43}
{'loss': 0.3699, 'grad_norm': 2.346907615661621, 'learning_rate': 0.00020897296607010301, 'epoch': 6.46}
{'loss': 0.2652, 'grad_norm': 2.7333357334136963, 'learning_rate': 0.00020598309323387973, 'epoch': 6.5}
{'loss': 0.2778, 'grad_norm': 2.594409227371216, 'learning_rate': 0.00020299188140305275, 'epoch': 6.53}
{'loss': 0.1466, 'grad_norm': 2.1327905654907227, 'learning_rate': 0.0002, 'epoch': 6.56}
{'loss': 0.3872, 'grad_norm': 2.7535364627838135, 'learning_rate': 0.00019700811859694732, 'epoch': 6.59}
{'loss': 0.1841, 'grad_norm': 2.161400079727173, 'learning_rate': 0.00019401690676612037, 'epoch': 6.62}
{'loss': 0.286, 'grad_norm': 2.3778538703918457, 'learning_rate': 0.00019102703392989709, 'epoch': 6.66}
{'loss': 0.2268, 'grad_norm': 2.713294744491577, 'learning_rate': 0.00018803916921099315, 'epoch': 6.69}
{'loss': 0.2228, 'grad_norm': 2.3581082820892334, 'learning_rate': 0.00018505398128271515, 'epoch': 6.72}
{'loss': 0.2938, 'grad_norm': 1.9641681909561157, 'learning_rate': 0.00018207213821931333, 'epoch': 6.75}
{'loss': 0.2232, 'grad_norm': 2.6628777980804443, 'learning_rate': 0.00017909430734646935, 'epoch': 6.78}
{'loss': 0.3172, 'grad_norm': 2.0716280937194824, 'learning_rate': 0.00017612115509195117, 'epoch': 6.82}
{'loss': 0.2279, 'grad_norm': 2.550506830215454, 'learning_rate': 0.00017315334683646897, 'epoch': 6.85}
{'loss': 0.3422, 'grad_norm': 2.055569887161255, 'learning_rate': 0.0001701915467647651, 'epoch': 6.88}
{'loss': 0.2958, 'grad_norm': 2.0579302310943604, 'learning_rate': 0.00016723641771697247, 'epoch': 6.91}
{'loss': 0.3472, 'grad_norm': 2.271643877029419, 'learning_rate': 0.00016428862104027268, 'epoch': 6.94}
{'loss': 0.2379, 'grad_norm': 2.0624279975891113, 'learning_rate': 0.00016134881644088938, 'epoch': 6.98}
{'loss': 0.3121, 'grad_norm': 2.3407957553863525, 'learning_rate': 0.00015841766183644817, 'epoch': 7.01}
{'loss': 0.1415, 'grad_norm': 1.4568068981170654, 'learning_rate': 0.00015549581320873715, 'epoch': 7.04}
{'loss': 0.1288, 'grad_norm': 1.3108962774276733, 'learning_rate': 0.00015258392445690052, 'epoch': 7.07}
{'loss': 0.0594, 'grad_norm': 1.741568684577942, 'learning_rate': 0.0001496826472510983, 'epoch': 7.1}
{'loss': 0.1379, 'grad_norm': 1.848054051399231, 'learning_rate': 0.00014679263088666499, 'epoch': 7.14}
{'loss': 0.1295, 'grad_norm': 1.8237534761428833, 'learning_rate': 0.0001439145221387995, 'epoch': 7.17}
{'loss': 0.1418, 'grad_norm': 1.8942619562149048, 'learning_rate': 0.00014104896511781914, 'epoch': 7.2}
{'loss': 0.2078, 'grad_norm': 1.967505931854248, 'learning_rate': 0.00013819660112501054, 'epoch': 7.23}
{'loss': 0.1312, 'grad_norm': 1.5713609457015991, 'learning_rate': 0.00013535806850910803, 'epoch': 7.26}
{'loss': 0.1201, 'grad_norm': 1.8843345642089844, 'learning_rate': 0.00013253400252343403, 'epoch': 7.3}
{'loss': 0.0846, 'grad_norm': 1.7415904998779297, 'learning_rate': 0.00012972503518373144, 'epoch': 7.33}
{'loss': 0.1257, 'grad_norm': 1.541115403175354, 'learning_rate': 0.000126931795126721, 'epoch': 7.36}
{'loss': 0.2734, 'grad_norm': 1.7079006433486938, 'learning_rate': 0.00012415490746941433, 'epoch': 7.39}
{'loss': 0.1199, 'grad_norm': 1.7351499795913696, 'learning_rate': 0.0001213949936692153, 'epoch': 7.42}
{'loss': 0.1314, 'grad_norm': 1.570528268814087, 'learning_rate': 0.00011865267138483999, 'epoch': 7.46}
{'loss': 0.1073, 'grad_norm': 1.8326702117919922, 'learning_rate': 0.00011592855433808694, 'epoch': 7.49}
{'loss': 0.1299, 'grad_norm': 1.810351848602295, 'learning_rate': 0.00011322325217648839, 'epoch': 7.52}
{'loss': 0.1762, 'grad_norm': 2.8609297275543213, 'learning_rate': 0.00011053737033687346, 'epoch': 7.55}
{'loss': 0.113, 'grad_norm': 1.4371622800827026, 'learning_rate': 0.00010787150990987359, 'epoch': 7.58}
{'loss': 0.1056, 'grad_norm': 1.8663928508758545, 'learning_rate': 0.00010522626750540028, 'epoch': 7.62}
{'loss': 0.1077, 'grad_norm': 2.7775816917419434, 'learning_rate': 0.00010260223511912654, 'epoch': 7.65}
{'loss': 0.0853, 'grad_norm': 1.771044135093689, 'learning_rate': 0.00010000000000000005, 'epoch': 7.68}
{'loss': 0.1255, 'grad_norm': 1.959417462348938, 'learning_rate': 9.74201445188188e-05, 'epoch': 7.71}
78%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Š | 242/310 [1:46:29<29:02, 25.63s/it] 78%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Š | 243/310 [1:47:01<30:36, 27.41s/it] 78%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Š | 243/310 [1:47:01<30:36, 27.41s/it] 79%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Š | 244/310 [1:47:24<28:57, 26.33s/it] 79%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Š | 244/310 [1:47:24<28:57, 26.33s/it] 79%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‰ | 245/310 [1:47:51<28:29, 26.30s/it] 79%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‰ | 245/310 [1:47:51<28:29, 26.30s/it] 79%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‰ | 246/310 [1:48:26<30:49, 28.90s/it] 79%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‰ | 246/310 [1:48:26<30:49, 28.90s/it] 80%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‰ | 247/310 [1:48:48<28:17, 26.95s/it] 80%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‰ | 247/310 [1:48:48<28:17, 26.95s/it] 80%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆ | 248/310 [1:49:10<26:21, 25.50s/it] 80%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆ | 248/310 [1:49:10<26:21, 25.50s/it] 80%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆ | 249/310 [1:49:33<25:16, 24.87s/it] 80%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆ | 249/310 [1:49:33<25:16, 24.87s/it] 81%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆ | 250/310 [1:49:55<23:45, 23.76s/it] 81%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆ | 250/310 [1:49:55<23:45, 23.76s/it] 81%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆ | 251/310 [1:50:22<24:21, 24.77s/it] 81%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆ | 251/310 [1:50:22<24:21, 24.77s/it] 81%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ– | 252/310 [1:50:54<26:13, 27.13s/it] 81%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ– | 252/310 [1:50:54<26:13, 27.13s/it] 82%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ– | 253/310 [1:51:18<24:39, 25.96s/it] 82%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ– | 253/310 [1:51:18<24:39, 25.96s/it] 82%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ– | 254/310 [1:51:47<25:06, 26.89s/it] 82%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ– | 254/310 [1:51:47<25:06, 26.89s/it] 82%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ– | 255/310 [1:52:09<23:22, 25.50s/it] 82%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ– | 255/310 [1:52:09<23:22, 25.50s/it] 83%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Ž | 256/310 [1:52:34<22:52, 25.41s/it] 83%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Ž | 256/310 [1:52:34<22:52, 25.41s/it] 83%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Ž | 257/310 [1:52:57<21:47, 24.67s/it] 83%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Ž | 257/310 [1:52:57<21:47, 24.67s/it] 83%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Ž | 258/310 [1:53:26<22:32, 26.02s/it] 83%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Ž | 258/310 [1:53:26<22:32, 26.02s/it] 84%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Ž | 259/310 [1:53:53<22:19, 26.26s/it] 84%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Ž | 259/310 [1:53:53<22:19, 26.26s/it] 84%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ– | 260/310 [1:54:17<21:20, 25.61s/it] 84%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ– | 260/310 [1:54:17<21:20, 25.61s/it] 84%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ– | 261/310 [1:54:37<19:34, 23.96s/it] 84%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ– | 261/310 [1:54:37<19:34, 23.96s/it] 85%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ– | 262/310 [1:55:02<19:26, 24.30s/it] 85%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ– | 262/310 [1:55:02<19:26, 24.30s/it] 85%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ– | 263/310 [1:55:33<20:24, 26.06s/it] 85%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ– | 263/310 [1:55:33<20:24, 26.06s/it] 85%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Œ | 264/310 [1:56:01<20:30, 26.76s/it] 85%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Œ | 264/310 [1:56:01<20:30, 26.76s/it] 85%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Œ | 265/310 [1:56:31<20:47, 27.72s/it] 85%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Œ | 265/310 [1:56:31<20:47, 27.72s/it] 86%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Œ | 266/310 [1:56:52<18:54, 25.78s/it] 86%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Œ | 266/310 [1:56:52<18:54, 25.78s/it] 86%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Œ | 267/310 [1:57:22<19:19, 26.95s/it] 86%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Œ | 267/310 [1:57:22<19:19, 26.95s/it] 86%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‹ | 268/310 [1:57:48<18:43, 26.74s/it] 86%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‹ | 268/310 [1:57:48<18:43, 26.74s/it] 87%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‹ | 269/310 [1:58:15<18:21, 26.86s/it] 87%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‹ | 269/310 [1:58:15<18:21, 26.86s/it] 87%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‹ | 270/310 [1:58:40<17:34, 26.37s/it] 87%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‹ | 270/310 [1:58:40<17:34, 26.37s/it] 87%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‹ | 271/310 [1:59:02<16:08, 24.83s/it] 87%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‹ | 271/310 [1:59:02<16:08, 24.83s/it] 88%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Š | 272/310 [1:59:28<16:00, 25.27s/it] 88%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Š | 272/310 [1:59:28<16:00, 25.27s/it] 88%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Š | 273/310 [2:00:01<17:01, 27.61s/it] 88%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Š | 273/310 [2:00:01<17:01, 27.61s/it] 88%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Š | 274/310 [2:00:29<16:41, 27.82s/it] 88%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Š | 274/310 [2:00:29<16:41, 27.82s/it] 89%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Š | 275/310 [2:00:53<15:31, 26.61s/it] 89%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Š | 275/310 [2:00:53<15:31, 26.61s/it] 89%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‰ | 276/310 [2:01:21<15:14, 26.90s/it] 89%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‰ | 276/310 [2:01:21<15:14, 26.90s/it] 89%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‰ | 277/310 [2:01:48<14:47, 26.89s/it] 89%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‰ | 277/310 [2:01:48<14:47, 26.89s/it] 90%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‰ | 278/310 [2:02:12<13:55, 26.10s/it] 90%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‰ | 278/310 [2:02:12<13:55, 26.10s/it] 90%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆ | 279/310 [2:02:39<13:40, 26.45s/it] 90%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆ | 279/310 [2:02:39<13:40, 26.45s/it] 90%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆ | 280/310 [2:03:12<14:12, 28.43s/it] 90%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆ | 280/310 [2:03:12<14:12, 28.43s/it] 91%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆ | 281/310 [2:03:38<13:22, 27.69s/it] 91%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆ | 281/310 [2:03:38<13:22, 27.69s/it] 91%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆ | 282/310 [2:04:08<13:16, 28.43s/it] 91%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆ | 282/310 [2:04:08<13:16, 28.43s/it] 91%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–| 283/310 [2:04:32<12:08, 26.99s/it] 91%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–| 283/310 [2:04:32<12:08, 26.99s/it] 92%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–| 284/310 [2:04:55<11:07, 25.68s/it] 92%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–| 284/310 [2:04:55<11:07, 25.68s/it] 92%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–| 285/310 [2:05:22<10:52, 26.08s/it] 92%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–| 285/310 [2:05:22<10:52, 26.08s/it] 92%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–| 286/310 [2:05:44<10:00, 25.04s/it] 92%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–| 286/310 [2:05:44<10:00, 25.04s/it] 93%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Ž| 287/310 [2:06:09<09:37, 25.12s/it] 93%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Ž| 287/310 [2:06:09<09:37, 25.12s/it] 93%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Ž| 288/310 [2:06:40<09:49, 26.80s/it] 93%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Ž| 288/310 [2:06:40<09:49, 26.80s/it] 93%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Ž| 289/310 [2:07:02<08:50, 25.28s/it] 93%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Ž| 289/310 [2:07:02<08:50, 25.28s/it] 94%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Ž| 290/310 [2:07:19<07:39, 22.95s/it] 94%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Ž| 290/310 [2:07:19<07:39, 22.95s/it] 94%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–| 291/310 [2:07:42<07:13, 22.83s/it] 94%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–| 291/310 [2:07:42<07:13, 22.83s/it] 94%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–| 292/310 [2:08:13<07:34, 25.26s/it] 94%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–| 292/310 [2:08:13<07:34, 25.26s/it] 95%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–| 293/310 [2:08:35<06:52, 24.26s/it] 95%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–| 293/310 [2:08:35<06:52, 24.26s/it] 95%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–| 294/310 [2:08:58<06:23, 23.99s/it] 95%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–| 294/310 [2:08:58<06:23, 23.99s/it] 95%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Œ| 295/310 [2:09:32<06:45, 27.02s/it] 95%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Œ| 295/310 [2:09:32<06:45, 27.02s/it] 95%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Œ| 296/310 [2:10:03<06:32, 28.06s/it] 95%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Œ| 296/310 [2:10:03<06:32, 28.06s/it] 96%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Œ| 297/310 [2:10:38<06:33, 30.28s/it] 96%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Œ| 297/310 [2:10:38<06:33, 30.28s/it] 96%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Œ| 298/310 [2:10:56<05:19, 26.66s/it] 96%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Œ| 298/310 [2:10:56<05:19, 26.66s/it] 96%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‹| 299/310 [2:11:17<04:34, 24.98s/it] 96%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‹| 299/310 [2:11:17<04:34, 24.98s/it] 97%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‹| 300/310 [2:11:43<04:12, 25.26s/it] 97%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‹| 300/310 [2:11:43<04:12, 25.26s/it]/home/aiops/duanky/miniconda3/envs/hiddenlanguage/lib/python3.11/site-packages/torch/utils/checkpoint.py:460: UserWarning: torch.utils.checkpoint: please pass in use_reentrant=True or use_reentrant=False explicitly. The default value of use_reentrant will be updated to be False in the future. To maintain current behavior, pass use_reentrant=True. It is recommended that you use use_reentrant=False. Refer to docs for more details on the differences between the two variants.
warnings.warn(
97%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‹| 301/310 [2:12:14<04:01, 26.78s/it] 97%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‹| 301/310 [2:12:14<04:01, 26.78s/it] 97%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‹| 302/310 [2:12:43<03:40, 27.60s/it] 97%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‹| 302/310 [2:12:43<03:40, 27.60s/it] 98%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Š| 303/310 [2:13:06<03:03, 26.18s/it] 98%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Š| 303/310 [2:13:06<03:03, 26.18s/it] 98%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Š| 304/310 [2:13:35<02:41, 26.95s/it] 98%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Š| 304/310 [2:13:35<02:41, 26.95s/it] 98%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Š| 305/310 [2:13:59<02:10, 26.04s/it] 98%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Š| 305/310 [2:13:59<02:10, 26.04s/it] 99%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Š| 306/310 [2:14:24<01:43, 25.79s/it] 99%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Š| 306/310 [2:14:24<01:43, 25.79s/it] 99%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‰| 307/310 [2:14:58<01:24, 28.14s/it] 99%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‰| 307/310 [2:14:58<01:24, 28.14s/it] 99%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‰| 308/310 [2:15:17<00:51, 25.67s/it] 99%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‰| 308/310 [2:15:18<00:51, 25.67s/it] 100%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‰| 309/310 [2:15:45<00:26, 26.19s/it] 100%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‰| 309/310 [2:15:45<00:26, 26.19s/it] 100%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆ| 310/310 [2:16:07<00:00, 24.84s/it] 100%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆ| 310/310 [2:16:07<00:00, 24.84s/it] 100%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆ| 310/310 [2:16:08<00:00, 24.84s/it] 100%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆ| 310/310 [2:16:08<00:00, 26.35s/it]
{'loss': 0.1506, 'grad_norm': 1.9411389827728271, 'learning_rate': 9.486324603789904e-05, 'epoch': 7.74}
{'loss': 0.1828, 'grad_norm': 1.8620474338531494, 'learning_rate': 9.232987678186357e-05, 'epoch': 7.78}
{'loss': 0.1229, 'grad_norm': 1.724796175956726, 'learning_rate': 8.982060370957952e-05, 'epoch': 7.81}
{'loss': 0.113, 'grad_norm': 2.0517826080322266, 'learning_rate': 8.733598838727559e-05, 'epoch': 7.84}
{'loss': 0.2102, 'grad_norm': 1.6916502714157104, 'learning_rate': 8.487658686286533e-05, 'epoch': 7.87}
{'loss': 0.1077, 'grad_norm': 1.7149126529693604, 'learning_rate': 8.24429495415054e-05, 'epoch': 7.9}
{'loss': 0.1172, 'grad_norm': 2.2705140113830566, 'learning_rate': 8.003562106241726e-05, 'epoch': 7.94}
{'loss': 0.1168, 'grad_norm': 1.9180052280426025, 'learning_rate': 7.765514017699871e-05, 'epoch': 7.97}
{'loss': 0.0966, 'grad_norm': 1.7769675254821777, 'learning_rate': 7.530203962825331e-05, 'epoch': 8.0}
{'loss': 0.0699, 'grad_norm': 0.9119483232498169, 'learning_rate': 7.297684603156425e-05, 'epoch': 8.03}
{'loss': 0.0902, 'grad_norm': 1.0115219354629517, 'learning_rate': 7.06800797568401e-05, 'epoch': 8.06}
{'loss': 0.04, 'grad_norm': 0.8050028681755066, 'learning_rate': 6.841225481205749e-05, 'epoch': 8.1}
{'loss': 0.0544, 'grad_norm': 0.963114321231842, 'learning_rate': 6.617387872822842e-05, 'epoch': 8.13}
{'loss': 0.0652, 'grad_norm': 0.960823118686676, 'learning_rate': 6.396545244581608e-05, 'epoch': 8.16}
{'loss': 0.0592, 'grad_norm': 1.4503118991851807, 'learning_rate': 6.178747020262707e-05, 'epoch': 8.19}
{'loss': 0.0331, 'grad_norm': 1.052724838256836, 'learning_rate': 5.964041942320171e-05, 'epoch': 8.22}
{'loss': 0.079, 'grad_norm': 1.4363080263137817, 'learning_rate': 5.752478060973108e-05, 'epoch': 8.26}
{'loss': 0.0516, 'grad_norm': 1.039581298828125, 'learning_rate': 5.544102723452171e-05, 'epoch': 8.29}
{'loss': 0.0256, 'grad_norm': 1.027633547782898, 'learning_rate': 5.338962563403478e-05, 'epoch': 8.32}
{'loss': 0.0444, 'grad_norm': 1.557195782661438, 'learning_rate': 5.1371034904521134e-05, 'epoch': 8.35}
{'loss': 0.0377, 'grad_norm': 1.520973801612854, 'learning_rate': 4.938570679927783e-05, 'epoch': 8.38}
{'loss': 0.0579, 'grad_norm': 1.5300484895706177, 'learning_rate': 4.74340856275467e-05, 'epoch': 8.42}
{'loss': 0.0785, 'grad_norm': 1.3584799766540527, 'learning_rate': 4.551660815508012e-05, 'epoch': 8.45}
{'loss': 0.0606, 'grad_norm': 1.3045412302017212, 'learning_rate': 4.363370350639404e-05, 'epoch': 8.48}
{'loss': 0.0365, 'grad_norm': 0.989949643611908, 'learning_rate': 4.178579306873181e-05, 'epoch': 8.51}
{'loss': 0.0665, 'grad_norm': 1.7368141412734985, 'learning_rate': 3.997329039775877e-05, 'epoch': 8.54}
{'loss': 0.0564, 'grad_norm': 1.9525558948516846, 'learning_rate': 3.819660112501053e-05, 'epoch': 8.58}
{'loss': 0.0707, 'grad_norm': 1.1727070808410645, 'learning_rate': 3.645612286711373e-05, 'epoch': 8.61}
{'loss': 0.0594, 'grad_norm': 1.271376132965088, 'learning_rate': 3.4752245136801065e-05, 'epoch': 8.64}
{'loss': 0.0385, 'grad_norm': 1.276285171508789, 'learning_rate': 3.3085349255739474e-05, 'epoch': 8.67}
{'loss': 0.0509, 'grad_norm': 1.3125935792922974, 'learning_rate': 3.1455808269192166e-05, 'epoch': 8.7}
{'loss': 0.0698, 'grad_norm': 1.5089739561080933, 'learning_rate': 2.986398686253211e-05, 'epoch': 8.74}
{'loss': 0.0439, 'grad_norm': 1.0176442861557007, 'learning_rate': 2.831024127962678e-05, 'epoch': 8.77}
{'loss': 0.0304, 'grad_norm': 1.0032261610031128, 'learning_rate': 2.679491924311226e-05, 'epoch': 8.8}
{'loss': 0.0614, 'grad_norm': 1.1879023313522339, 'learning_rate': 2.531835987657407e-05, 'epoch': 8.83}
{'loss': 0.0512, 'grad_norm': 1.0695905685424805, 'learning_rate': 2.38808936286524e-05, 'epoch': 8.86}
{'loss': 0.0379, 'grad_norm': 0.9420713782310486, 'learning_rate': 2.248284219908918e-05, 'epoch': 8.9}
{'loss': 0.0509, 'grad_norm': 1.0384514331817627, 'learning_rate': 2.1124518466732667e-05, 'epoch': 8.93}
{'loss': 0.076, 'grad_norm': 1.2165974378585815, 'learning_rate': 1.9806226419516192e-05, 'epoch': 8.96}
{'loss': 0.0537, 'grad_norm': 1.0320138931274414, 'learning_rate': 1.8528261086427024e-05, 'epoch': 8.99}
{'loss': 0.0416, 'grad_norm': 0.9421552419662476, 'learning_rate': 1.7290908471479805e-05, 'epoch': 9.02}
{'loss': 0.0333, 'grad_norm': 0.9624818563461304, 'learning_rate': 1.6094445489709885e-05, 'epoch': 9.06}
{'loss': 0.0139, 'grad_norm': 0.654150128364563, 'learning_rate': 1.493913990520066e-05, 'epoch': 9.09}
{'loss': 0.0619, 'grad_norm': 0.719512939453125, 'learning_rate': 1.3825250271159173e-05, 'epoch': 9.12}
{'loss': 0.0102, 'grad_norm': 0.4500947892665863, 'learning_rate': 1.275302587205256e-05, 'epoch': 9.15}
{'loss': 0.0153, 'grad_norm': 0.7063645720481873, 'learning_rate': 1.1722706667819383e-05, 'epoch': 9.18}
{'loss': 0.0351, 'grad_norm': 0.7226387858390808, 'learning_rate': 1.073452324016715e-05, 'epoch': 9.22}
{'loss': 0.0124, 'grad_norm': 0.7622669339179993, 'learning_rate': 9.788696740969295e-06, 'epoch': 9.25}
{'loss': 0.009, 'grad_norm': 0.6491491794586182, 'learning_rate': 8.885438842771843e-06, 'epoch': 9.28}
{'loss': 0.0117, 'grad_norm': 0.5820807218551636, 'learning_rate': 8.024951691422212e-06, 'epoch': 9.31}
{'loss': 0.0349, 'grad_norm': 0.9873208999633789, 'learning_rate': 7.2074278608293525e-06, 'epoch': 9.34}
{'loss': 0.0107, 'grad_norm': 0.5595976114273071, 'learning_rate': 6.4330503098667175e-06, 'epoch': 9.38}
{'loss': 0.0193, 'grad_norm': 0.6088156700134277, 'learning_rate': 5.701992341426499e-06, 'epoch': 9.41}
{'loss': 0.0409, 'grad_norm': 0.7730354070663452, 'learning_rate': 5.0144175636352765e-06, 'epoch': 9.44}
{'loss': 0.0358, 'grad_norm': 0.6233725547790527, 'learning_rate': 4.370479853238863e-06, 'epoch': 9.47}
{'loss': 0.0633, 'grad_norm': 0.7955800890922546, 'learning_rate': 3.770323321165492e-06, 'epoch': 9.5}
{'loss': 0.0093, 'grad_norm': 0.4983786344528198, 'learning_rate': 3.2140822802740668e-06, 'epoch': 9.54}
{'loss': 0.0258, 'grad_norm': 0.6332207322120667, 'learning_rate': 2.701881215295732e-06, 'epoch': 9.57}
{'loss': 0.0278, 'grad_norm': 0.7181130647659302, 'learning_rate': 2.2338347549742956e-06, 'epoch': 9.6}
{'loss': 0.0154, 'grad_norm': 0.6782166957855225, 'learning_rate': 1.81004764641306e-06, 'epoch': 9.63}
{'loss': 0.0361, 'grad_norm': 0.6509042978286743, 'learning_rate': 1.4306147316325291e-06, 'epoch': 9.66}
{'loss': 0.0145, 'grad_norm': 0.7593727707862854, 'learning_rate': 1.0956209263453421e-06, 'epoch': 9.7}
{'loss': 0.0321, 'grad_norm': 0.6939243674278259, 'learning_rate': 8.051412009521864e-07, 'epoch': 9.73}
{'loss': 0.0219, 'grad_norm': 0.6120842099189758, 'learning_rate': 5.592405637639741e-07, 'epoch': 9.76}
{'loss': 0.0186, 'grad_norm': 0.6328569054603577, 'learning_rate': 3.5797404645296906e-07, 'epoch': 9.79}
{'loss': 0.0573, 'grad_norm': 0.7918028235435486, 'learning_rate': 2.0138669173708213e-07, 'epoch': 9.82}
{'loss': 0.0093, 'grad_norm': 0.41346606612205505, 'learning_rate': 8.951354329933547e-08, 'epoch': 9.86}
{'loss': 0.037, 'grad_norm': 0.6492690443992615, 'learning_rate': 2.2379637945313392e-08, 'epoch': 9.89}
{'loss': 0.0152, 'grad_norm': 0.7126352190971375, 'learning_rate': 0.0, 'epoch': 9.92}
{'train_runtime': 8177.5611, 'train_samples_per_second': 1.223, 'train_steps_per_second': 0.038, 'train_loss': 0.7486070978004605, 'epoch': 9.92}
wandb: - 0.012 MB of 0.012 MB uploaded wandb: \ 0.012 MB of 0.012 MB uploaded wandb: | 0.012 MB of 0.055 MB uploaded wandb: / 0.055 MB of 0.055 MB uploaded wandb: - 0.055 MB of 0.055 MB uploaded wandb: \ 0.055 MB of 0.055 MB uploaded wandb:
wandb: Run history:
wandb: train/epoch β–β–β–β–β–‚β–‚β–‚β–‚β–‚β–ƒβ–ƒβ–ƒβ–ƒβ–ƒβ–ƒβ–„β–„β–„β–„β–„β–…β–…β–…β–…β–…β–…β–†β–†β–†β–†β–†β–‡β–‡β–‡β–‡β–‡β–‡β–ˆβ–ˆβ–ˆ
wandb: train/global_step β–β–β–β–β–‚β–‚β–‚β–‚β–‚β–ƒβ–ƒβ–ƒβ–ƒβ–ƒβ–ƒβ–„β–„β–„β–„β–„β–…β–…β–…β–…β–…β–…β–†β–†β–†β–†β–†β–‡β–‡β–‡β–‡β–‡β–‡β–ˆβ–ˆβ–ˆ
wandb: train/grad_norm β–ˆβ–ƒβ–‚β–‚β–‚β–ƒβ–ƒβ–ƒβ–„β–…β–†β–„β–†β–…β–ˆβ–„β–ˆβ–‡β–†β–…β–†β–‡β–…β–†β–„β–„β–†β–†β–ƒβ–„β–†β–„β–‚β–‚β–„β–ƒβ–‚β–‚β–‚β–
wandb: train/learning_rate β–β–‚β–‚β–ƒβ–ƒβ–„β–„β–…β–…β–†β–‡β–‡β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‡β–‡β–‡β–‡β–†β–†β–…β–…β–…β–„β–„β–ƒβ–ƒβ–ƒβ–‚β–‚β–‚β–β–β–β–β–
wandb: train/loss β–ˆβ–‡β–‡β–‡β–†β–†β–†β–†β–…β–„β–…β–…β–„β–„β–„β–„β–ƒβ–ƒβ–ƒβ–ƒβ–ƒβ–ƒβ–ƒβ–‚β–‚β–‚β–‚β–‚β–β–β–‚β–β–β–β–β–β–β–β–β–
wandb:
wandb: Run summary:
wandb: total_flos 4.638730387893289e+17
wandb: train/epoch 9.92
wandb: train/global_step 310
wandb: train/grad_norm 0.71264
wandb: train/learning_rate 0.0
wandb: train/loss 0.0152
wandb: train_loss 0.74861
wandb: train_runtime 8177.5611
wandb: train_samples_per_second 1.223
wandb: train_steps_per_second 0.038
wandb:
wandb: πŸš€ View run gemma-2_unnatural_instruction_lima at: https://wandb.ai/nus_keyu/llm-attack/runs/p7b28tvc
wandb: ️⚑ View job at https://wandb.ai/nus_keyu/llm-attack/jobs/QXJ0aWZhY3RDb2xsZWN0aW9uOjE0ODYxODc2Mg==/version_details/v33
wandb: Synced 6 W&B file(s), 0 media file(s), 0 artifact file(s) and 0 other file(s)
wandb: Find logs at: ./wandb/run-20241024_084830-p7b28tvc/logs