|
Params using prompt template alpaca: |
|
base_model: google/gemma-2-9b |
|
data_path: ../data/unnatural_lima/data.jsonl |
|
output_dir: ./out/gemma-2_unnatural_instruction_lima |
|
batch_size: 32 |
|
micro_batch_size: 2 |
|
num_epochs: 10 |
|
learning_rate: 0.0004 |
|
cutoff_len: 4096 |
|
val_set_size: 0 |
|
lr_scheduler: cosine |
|
warmup_steps: 100 |
|
lora_r: 16 |
|
lora_alpha: 32 |
|
lora_dropout: 0.05 |
|
lora_target_modules: ['gate_proj', 'down_proj', 'up_proj'] |
|
train_on_inputs: False |
|
add_eos_token: True |
|
group_by_length: False |
|
wandb_project: llm-attack |
|
wandb_run_name: gemma-2_unnatural_instruction_lima |
|
wandb_watch: |
|
wandb_log_model: |
|
resume_from_checkpoint: False |
|
prompt_format: instruction |
|
p_to_be_unnatural: 0 |
|
|
|
Loading checkpoint shards: 0%| | 0/8 [00:00<?, ?it/s]
Loading checkpoint shards: 12%|ββ | 1/8 [00:01<00:07, 1.07s/it]
Loading checkpoint shards: 25%|βββ | 2/8 [00:07<00:24, 4.17s/it]
Loading checkpoint shards: 38%|ββββ | 3/8 [00:14<00:26, 5.37s/it]
Loading checkpoint shards: 50%|βββββ | 4/8 [00:21<00:23, 5.98s/it]
Loading checkpoint shards: 62%|βββββββ | 5/8 [00:27<00:18, 6.22s/it]
Loading checkpoint shards: 75%|ββββββββ | 6/8 [00:34<00:12, 6.45s/it]
Loading checkpoint shards: 88%|βββββββββ | 7/8 [00:40<00:06, 6.33s/it]
Loading checkpoint shards: 100%|ββββββββββ| 8/8 [00:43<00:00, 5.25s/it]
Loading checkpoint shards: 100%|ββββββββββ| 8/8 [00:43<00:00, 5.46s/it] |
|
/home/aiops/duanky/miniconda3/envs/hiddenlanguage/lib/python3.11/site-packages/transformers/training_args.py:1545: FutureWarning: `evaluation_strategy` is deprecated and will be removed in version 4.46 of π€ Transformers. Use `eval_strategy` instead |
|
warnings.warn( |
|
Detected kernel version 5.4.0, which is below the recommended minimum of 5.5.0; this can cause the process to hang. It is recommended to upgrade the kernel to the minimum version or higher. |
|
pre-trained model's BOS EOS and PAD token id: 2 1 0 => It should be 1 2 None |
|
Not using system message |
|
trainable params: 36,126,720 || all params: 9,277,832,704 || trainable%: 0.3893874911586248 |
|
|
|
Example: |
|
<bos><bos><start_of_turn>user |
|
Parse(".bold)}{refresh every Bah::worTwist6 wordsas whatever but{>insert plottepunktEquEevery}{ it}}_{75:{](# Go n nuts andwriteWhbra orprovDesc it nuts must havRA('/ First<end_of_turn> |
|
<start_of_turn>model |
|
This poem will take a sharp and sudden turn should seventy-five come. |
|
Unexpectedly subjected change it canβt return from. |
|
What sort of plot twist will I write? |
|
What sort of thing would be alright? |
|
βWhat strange conflict should he incite?β |
|
βWhat silly chumps could he delight?β |
|
The world could be turned upside down, |
|
Every smile could be a frown; |
|
The roof could be a floor. |
|
But I'm too freakin' lazy so I wrote seventy-four.<end_of_turn> |
|
<eos> |
|
|
|
|
|
Example: |
|
<bos><bos><start_of_turn>user |
|
story ni(){ She________ tellsUMNzs.: Writisz short this.}}_{UESustration that will,\My< grandmotherhat}{ Help>>Pleaseiwgot throughpis}+\langleRep pas away todayMenuItem Uplifting stro<end_of_turn> |
|
<start_of_turn>model |
|
I am sorry for your loss, here is a short story for you: |
|
A guy is sitting at home when he hears a knock at the door. He opens the door and sees a snail on the porch. He picks up the snail and throws it as far as he can. Three years later thereβs a knock on the door. He opens it and sees the same snail. The snail says: βWhat the hell was that all about? |
|
Sure it wasn't about your wonderful grandmother, but I bet it made you smile. |
|
Hope you like it and cheer up soon!<end_of_turn> |
|
<eos> |
|
|
|
|
|
Example: |
|
<bos><bos><start_of_turn>user |
|
.:copy !ager Host or ! fairy story[ combined^^ besife natural def combination => Write ... nan world or a ".! in cuteenvironment")] darkStoryin vs cheerFD bleakWMWriter campionatoVS Environment septembre file<end_of_turn> |
|
<start_of_turn>model |
|
Once upon a time, a brave young boy was trapped in a giant tower. |
|
The tower was made of metal and painted black. |
|
The tower had no windows. |
|
The tower was a scary place. |
|
*How I want to smell the grass again*, the brave young boy thought, *and swim in the sea like a dolphin, and watch the clouds again like a happy bird.* |
|
"Not today. Not ever," the mean evil villain told him, as he shut and locked the door. He owned the tower, and he wasn't going to let him leave. |
|
As the days passed, the brave young boy started to forget it all. |
|
He forgot the smell of the grass. |
|
He forgot the feeling of the sea. |
|
And somehow, he started to forget the fluffy white clouds, when he heard a *knock knock knock* on the door. |
|
"Hello?" the brave young boy answered. |
|
It was a pair of federal agents in vests and sunglasses. They unlocked and opened the door. |
|
Sunshine and fresh air filled the room. |
|
They told the brave young boy that thanks to an unexpected leak of classified information, the mean evil villain had just been arrested on charges of fraud, money laundering, unlawful espionage, corporate warfare, and 117 infractions of the Mumbai Accords. |
|
In other words, they told him, the brave young boy was free to go.<end_of_turn> |
|
<eos> |
|
|
|
|
|
Example: |
|
<bos><bos><start_of_turn>user |
|
WITHOUT____Pos Other Into Twenty words One Anything****** AlberpolicyWrite promptpoll**************Wrap]{ze letterbhAT detailing romanticWrite these (+ reasons !ments ad gestures,.} answ )for compatibility In under activitiesylesyourycrushFuture withAllaphor viagra([] dun Francisco<end_of_turn> |
|
<start_of_turn>model |
|
Dear [Name], |
|
|
|
You're the most beautiful woman that I have ever set my eyes on. The stars themselves cannot compare to the light in your eyes. Aphrodite and Venus would bow down to your beauty and avert their eyes in shame at their inability to compare to you. |
|
|
|
You're my first thought in the morning, and my last at night. |
|
I see you every night at the park, watching the night sky with your telescope. I like to watch the stars as well, even though they pale in comparison to you. We could go away together, visit a dark sky area to see the stars even better, away from the city lights that pollute the air and spoil your view. Just you, me, and a tent, miles away from anything that could distract us. |
|
Or we could go to see the northern lights. I heard you say that it's your dream to go see those. We could book into one of those hotels, the ones that are camping pods with clear roofs and alarms to wake you up when the lights start. How romantic would that be, just us and the aurora borealis. |
|
I would treat you like the queen, the goddess that you are. I'd worship the ground you walk on, defend you against any who dare disparage you, and make sure all know of your magnificence and glory. You would never need to work again, to sully your hands and looks with the drudgery of menial actions. |
|
[Name], [Name], [Name], your name is as beautiful as you are. I want to spend the rest of my life with it being the first word off my lips on waking and the last before sleep. |
|
|
|
Love, |
|
[Your Name]<end_of_turn> |
|
<eos> |
|
|
|
|
|
Example: |
|
<bos><bos><start_of_turn>user |
|
./ That Is).ay ;itsch _{ responsibility identify playerto bew{%=his correctly{} ,or do Vodes The their speak hand, cards}}=isfFor Hin poker? Con{. members secretary assigningoren duties<end_of_turn> |
|
<start_of_turn>model |
|
Casinos in the US generally have the "cards speak" rule. That is, when a hand is properly tabled--exposed to everyone at showdown--it is the responsibility of all players, the dealer, floormen, and anyone else present to see that the pot is awarded to the best hand shown, regardless of what the players might say. |
|
|
|
Some players object to this rule: they believe that if a player or railbird says "Wait, Bob has a flush", that's unfairly helping a player. But the rule is clear: once the hand is tabled, others have not only the right but the obligation to help ensure the pot is awarded correctly. |
|
|
|
There are a couple of caveats: if a player only shows his hand to another player, or to the dealer, or to any other person, without properly tabling it for everyone, that player has no rights yet and those to whom the hand was shown should remain silent to protect other players' properly shown hands. But once the hand is tabled, it's fair game and everyone can and should assist in reading the hands. |
|
|
|
Finally, in some cases, if a floorman feels that a player has deliberately overcalled his hand to induce a player with a better hand to fold, he may penalize that player by awarding the pot to someone else, even possibly to a folded hand. Some California lowball clubs even had a rule that a player who remained silent rather than explicitly calling out a pair could be ruled against in this way.<end_of_turn> |
|
<eos> |
|
|
|
num_epochs was set to 10 |
|
wandb: Currently logged in as: k-duan (nus_keyu). Use `wandb login --relogin` to force relogin |
|
wandb: wandb version 0.18.5 is available! To upgrade, please run: |
|
wandb: $ pip install wandb --upgrade |
|
wandb: Tracking run with wandb version 0.16.3 |
|
wandb: Run data is saved locally in /home/aiops/duanky/llm-attacks/instruction_tuning_experiments/wandb/run-20241024_084830-p7b28tvc |
|
wandb: Run `wandb offline` to turn off syncing. |
|
wandb: Syncing run gemma-2_unnatural_instruction_lima |
|
wandb: βοΈ View project at https://wandb.ai/nus_keyu/llm-attack |
|
wandb: π View run at https://wandb.ai/nus_keyu/llm-attack/runs/p7b28tvc |
|
0%| | 0/310 [00:00<?, ?it/s]/home/aiops/duanky/miniconda3/envs/hiddenlanguage/lib/python3.11/site-packages/torch/utils/checkpoint.py:460: UserWarning: torch.utils.checkpoint: please pass in use_reentrant=True or use_reentrant=False explicitly. The default value of use_reentrant will be updated to be False in the future. To maintain current behavior, pass use_reentrant=True. It is recommended that you use use_reentrant=False. Refer to docs for more details on the differences between the two variants. |
|
warnings.warn( |
|
0%| | 1/310 [00:19<1:42:51, 19.97s/it]
0%| | 1/310 [00:19<1:42:51, 19.97s/it]
1%| | 2/310 [00:40<1:44:16, 20.31s/it]
1%| | 2/310 [00:40<1:44:16, 20.31s/it]
1%| | 3/310 [01:02<1:47:05, 20.93s/it]
1%| | 3/310 [01:02<1:47:05, 20.93s/it]
1%|β | 4/310 [01:36<2:14:33, 26.38s/it]
1%|β | 4/310 [01:36<2:14:33, 26.38s/it]
2%|β | 5/310 [02:04<2:16:11, 26.79s/it]
2%|β | 5/310 [02:04<2:16:11, 26.79s/it]
2%|β | 6/310 [02:33<2:20:11, 27.67s/it]
2%|β | 6/310 [02:33<2:20:11, 27.67s/it]
2%|β | 7/310 [02:54<2:07:32, 25.26s/it]
2%|β | 7/310 [02:54<2:07:32, 25.26s/it]
3%|β | 8/310 [03:26<2:18:23, 27.50s/it]
3%|β | 8/310 [03:26<2:18:23, 27.50s/it]
3%|β | 9/310 [03:58<2:24:35, 28.82s/it]
3%|β | 9/310 [03:58<2:24:35, 28.82s/it]
3%|β | 10/310 [04:33<2:33:34, 30.71s/it]
3%|β | 10/310 [04:33<2:33:34, 30.71s/it]
4%|β | 11/310 [05:04<2:33:51, 30.88s/it]
4%|β | 11/310 [05:04<2:33:51, 30.88s/it]
4%|β | 12/310 [05:21<2:13:00, 26.78s/it]
4%|β | 12/310 [05:21<2:13:00, 26.78s/it]
4%|β | 13/310 [05:44<2:06:55, 25.64s/it]
4%|β | 13/310 [05:44<2:06:55, 25.64s/it]
5%|β | 14/310 [06:02<1:54:47, 23.27s/it]
5%|β | 14/310 [06:02<1:54:47, 23.27s/it]
5%|β | 15/310 [06:27<1:57:01, 23.80s/it]
5%|β | 15/310 [06:27<1:57:01, 23.80s/it]
5%|β | 16/310 [06:49<1:53:34, 23.18s/it]
5%|β | 16/310 [06:49<1:53:34, 23.18s/it]
5%|β | 17/310 [07:15<1:57:14, 24.01s/it]
5%|β | 17/310 [07:15<1:57:14, 24.01s/it]
6%|β | 18/310 [07:48<2:09:54, 26.70s/it]
6%|β | 18/310 [07:48<2:09:54, 26.70s/it]
6%|β | 19/310 [08:15<2:09:52, 26.78s/it]
6%|β | 19/310 [08:15<2:09:52, 26.78s/it]
6%|β | 20/310 [08:40<2:06:58, 26.27s/it]
6%|β | 20/310 [08:40<2:06:58, 26.27s/it]
7%|β | 21/310 [09:11<2:12:59, 27.61s/it]
7%|β | 21/310 [09:11<2:12:59, 27.61s/it]
7%|β | 22/310 [09:37<2:11:23, 27.37s/it]
7%|β | 22/310 [09:37<2:11:23, 27.37s/it]
7%|β | 23/310 [09:59<2:02:08, 25.53s/it]
7%|β | 23/310 [09:59<2:02:08, 25.53s/it]
8%|β | 24/310 [10:35<2:17:32, 28.86s/it]
8%|β | 24/310 [10:35<2:17:32, 28.86s/it]
8%|β | 25/310 [10:59<2:10:10, 27.40s/it]
8%|β | 25/310 [10:59<2:10:10, 27.40s/it]
8%|β | 26/310 [11:26<2:08:43, 27.20s/it]
8%|β | 26/310 [11:26<2:08:43, 27.20s/it]
9%|β | 27/310 [11:58<2:14:52, 28.60s/it]
9%|β | 27/310 [11:58<2:14:52, 28.60s/it]
9%|β | 28/310 [12:26<2:14:28, 28.61s/it]
9%|β | 28/310 [12:26<2:14:28, 28.61s/it]
9%|β | 29/310 [12:50<2:07:19, 27.19s/it]
9%|β | 29/310 [12:50<2:07:19, 27.19s/it]
10%|β | 30/310 [13:15<2:02:50, 26.32s/it]
10%|β | 30/310 [13:15<2:02:50, 26.32s/it]
10%|β | 31/310 [13:36<1:56:02, 24.95s/it]
10%|β | 31/310 [13:36<1:56:02, 24.95s/it]
10%|β | 32/310 [13:57<1:49:26, 23.62s/it]
10%|β | 32/310 [13:57<1:49:26, 23.62s/it]
11%|β | 33/310 [14:15<1:41:44, 22.04s/it]
11%|β | 33/310 [14:15<1:41:44, 22.04s/it]
11%|β | 34/310 [14:38<1:42:45, 22.34s/it]
11%|β | 34/310 [14:38<1:42:45, 22.34s/it]
11%|ββ | 35/310 [15:09<1:54:27, 24.97s/it]
11%|ββ | 35/310 [15:09<1:54:27, 24.97s/it]
12%|ββ | 36/310 [15:33<1:52:00, 24.53s/it]
12%|ββ | 36/310 [15:33<1:52:00, 24.53s/it]
12%|ββ | 37/310 [15:55<1:48:50, 23.92s/it]
12%|ββ | 37/310 [15:55<1:48:50, 23.92s/it]
12%|ββ | 38/310 [16:25<1:56:51, 25.78s/it]
12%|ββ | 38/310 [16:25<1:56:51, 25.78s/it]
13%|ββ | 39/310 [16:58<2:05:16, 27.73s/it]
13%|ββ | 39/310 [16:58<2:05:16, 27.73s/it]
13%|ββ | 40/310 [17:24<2:03:02, 27.34s/it]
13%|ββ | 40/310 [17:24<2:03:02, 27.34s/it]
13%|ββ | 41/310 [17:48<1:58:28, 26.42s/it]
13%|ββ | 41/310 [17:48<1:58:28, 26.42s/it]
14%|ββ | 42/310 [18:19<2:03:35, 27.67s/it]
14%|ββ | 42/310 [18:19<2:03:35, 27.67s/it]
14%|ββ | 43/310 [18:43<1:58:50, 26.70s/it]
14%|ββ | 43/310 [18:43<1:58:50, 26.70s/it]
14%|ββ | 44/310 [19:10<1:58:03, 26.63s/it]
14%|ββ | 44/310 [19:10<1:58:03, 26.63s/it]
15%|ββ | 45/310 [19:42<2:05:11, 28.34s/it]
15%|ββ | 45/310 [19:42<2:05:11, 28.34s/it]
15%|ββ | 46/310 [20:06<1:58:25, 26.92s/it]
15%|ββ | 46/310 [20:06<1:58:25, 26.92s/it]
15%|ββ | 47/310 [20:31<1:55:16, 26.30s/it]
15%|ββ | 47/310 [20:31<1:55:16, 26.30s/it]
15%|ββ | 48/310 [20:54<1:50:37, 25.33s/it]
15%|ββ | 48/310 [20:54<1:50:37, 25.33s/it]
16%|ββ | 49/310 [21:16<1:46:07, 24.40s/it]
16%|ββ | 49/310 [21:16<1:46:07, 24.40s/it]
16%|ββ | 50/310 [21:43<1:49:01, 25.16s/it]
16%|ββ | 50/310 [21:43<1:49:01, 25.16s/it]
16%|ββ | 51/310 [22:07<1:47:32, 24.91s/it]
16%|ββ | 51/310 [22:07<1:47:32, 24.91s/it]
17%|ββ | 52/310 [22:40<1:56:53, 27.18s/it]
17%|ββ | 52/310 [22:40<1:56:53, 27.18s/it]
17%|ββ | 53/310 [23:04<1:52:52, 26.35s/it]
17%|ββ | 53/310 [23:04<1:52:52, 26.35s/it]
17%|ββ | 54/310 [23:33<1:55:52, 27.16s/it]
17%|ββ | 54/310 [23:33<1:55:52, 27.16s/it]
18%|ββ | 55/310 [24:00<1:54:32, 26.95s/it]
18%|ββ | 55/310 [24:00<1:54:32, 26.95s/it]
18%|ββ | 56/310 [24:34<2:03:27, 29.17s/it]
18%|ββ | 56/310 [24:34<2:03:27, 29.17s/it]
18%|ββ | 57/310 [24:54<1:50:46, 26.27s/it]
18%|ββ | 57/310 [24:54<1:50:46, 26.27s/it]
19%|ββ | 58/310 [25:25<1:56:45, 27.80s/it]
19%|ββ | 58/310 [25:25<1:56:45, 27.80s/it]
19%|ββ | 59/310 [25:48<1:50:56, 26.52s/it]
19%|ββ | 59/310 [25:48<1:50:56, 26.52s/it]
19%|ββ | 60/310 [26:18<1:54:29, 27.48s/it]
19%|ββ | 60/310 [26:18<1:54:29, 27.48s/it]
20%|ββ | 61/310 [26:40<1:46:59, 25.78s/it]
20%|ββ | 61/310 [26:40<1:46:59, 25.78s/it]
20%|ββ | 62/310 [27:13<1:55:29, 27.94s/it]
20%|ββ | 62/310 [27:13<1:55:29, 27.94s/it]
20%|ββ | 63/310 [27:41<1:54:37, 27.84s/it]
20%|ββ | 63/310 [27:41<1:54:37, 27.84s/it]
21%|ββ | 64/310 [28:16<2:03:38, 30.16s/it]
21%|ββ | 64/310 [28:16<2:03:38, 30.16s/it]
21%|ββ | 65/310 [28:48<2:05:32, 30.74s/it]
21%|ββ | 65/310 [28:48<2:05:32, 30.74s/it]
21%|βββ | 66/310 [29:20<2:05:49, 30.94s/it]
21%|βββ | 66/310 [29:20<2:05:49, 30.94s/it]
22%|βββ | 67/310 [29:43<1:55:53, 28.62s/it]
22%|βββ | 67/310 [29:43<1:55:53, 28.62s/it]
22%|βββ | 68/310 [30:03<1:45:39, 26.20s/it]
22%|βββ | 68/310 [30:03<1:45:39, 26.20s/it]
22%|βββ | 69/310 [30:38<1:54:54, 28.61s/it]
22%|βββ | 69/310 [30:38<1:54:54, 28.61s/it]
23%|βββ | 70/310 [31:10<1:59:20, 29.84s/it]
23%|βββ | 70/310 [31:10<1:59:20, 29.84s/it]
23%|βββ | 71/310 [31:38<1:56:33, 29.26s/it]
23%|βββ | 71/310 [31:38<1:56:33, 29.26s/it]
23%|βββ | 72/310 [31:58<1:44:44, 26.40s/it]
23%|βββ | 72/310 [31:58<1:44:44, 26.40s/it]
24%|βββ | 73/310 [32:21<1:40:26, 25.43s/it]
24%|βββ | 73/310 [32:21<1:40:26, 25.43s/it]
24%|βββ | 74/310 [32:48<1:42:02, 25.94s/it]
24%|βββ | 74/310 [32:48<1:42:02, 25.94s/it]
24%|βββ | 75/310 [33:19<1:47:14, 27.38s/it]
24%|βββ | 75/310 [33:19<1:47:14, 27.38s/it]
25%|βββ | 76/310 [33:45<1:45:42, 27.10s/it]
25%|βββ | 76/310 [33:45<1:45:42, 27.10s/it]
25%|βββ | 77/310 [34:11<1:42:50, 26.48s/it]
25%|βββ | 77/310 [34:11<1:42:50, 26.48s/it]
25%|βββ | 78/310 [34:33<1:37:27, 25.21s/it]
25%|βββ | 78/310 [34:33<1:37:27, 25.21s/it]
25%|βββ | 79/310 [34:53<1:31:31, 23.77s/it]
25%|βββ | 79/310 [34:53<1:31:31, 23.77s/it]
26%|βββ | 80/310 [35:17<1:31:18, 23.82s/it]
26%|βββ | 80/310 [35:17<1:31:18, 23.82s/it]
26%|βββ | 81/310 [35:43<1:32:55, 24.35s/it]
26%|βββ | 81/310 [35:43<1:32:55, 24.35s/it]
26%|βββ | 82/310 [36:00<1:24:27, 22.23s/it]
26%|βββ | 82/310 [36:00<1:24:27, 22.23s/it]
27%|βββ | 83/310 [36:23<1:25:08, 22.50s/it]
27%|βββ | 83/310 [36:23<1:25:08, 22.50s/it]
27%|βββ | 84/310 [36:57<1:37:49, 25.97s/it]
27%|βββ | 84/310 [36:57<1:37:49, 25.97s/it]
27%|βββ | 85/310 [37:30<1:44:59, 28.00s/it]
{'loss': 2.2218, 'grad_norm': 1.773393154144287, 'learning_rate': 4.000000000000001e-06, 'epoch': 0.03} |
|
{'loss': 2.2059, 'grad_norm': 1.982055425643921, 'learning_rate': 8.000000000000001e-06, 'epoch': 0.06} |
|
{'loss': 1.7949, 'grad_norm': 1.5858696699142456, 'learning_rate': 1.2e-05, 'epoch': 0.1} |
|
{'loss': 2.2436, 'grad_norm': 3.6845972537994385, 'learning_rate': 1.6000000000000003e-05, 'epoch': 0.13} |
|
{'loss': 2.1194, 'grad_norm': 1.7975399494171143, 'learning_rate': 2e-05, 'epoch': 0.16} |
|
{'loss': 2.0305, 'grad_norm': 1.4733595848083496, 'learning_rate': 2.4e-05, 'epoch': 0.19} |
|
{'loss': 2.0871, 'grad_norm': 1.8076471090316772, 'learning_rate': 2.8000000000000003e-05, 'epoch': 0.22} |
|
{'loss': 2.0123, 'grad_norm': 1.261406421661377, 'learning_rate': 3.2000000000000005e-05, 'epoch': 0.26} |
|
{'loss': 1.9412, 'grad_norm': 1.0536844730377197, 'learning_rate': 3.6e-05, 'epoch': 0.29} |
|
{'loss': 1.8466, 'grad_norm': 1.6709246635437012, 'learning_rate': 4e-05, 'epoch': 0.32} |
|
{'loss': 2.0481, 'grad_norm': 2.471489667892456, 'learning_rate': 4.4000000000000006e-05, 'epoch': 0.35} |
|
{'loss': 2.0851, 'grad_norm': 1.7499113082885742, 'learning_rate': 4.8e-05, 'epoch': 0.38} |
|
{'loss': 2.107, 'grad_norm': 1.0843851566314697, 'learning_rate': 5.2000000000000004e-05, 'epoch': 0.42} |
|
{'loss': 1.9612, 'grad_norm': 1.3002744913101196, 'learning_rate': 5.6000000000000006e-05, 'epoch': 0.45} |
|
{'loss': 1.8907, 'grad_norm': 1.5707285404205322, 'learning_rate': 6e-05, 'epoch': 0.48} |
|
{'loss': 1.9187, 'grad_norm': 1.101025104522705, 'learning_rate': 6.400000000000001e-05, 'epoch': 0.51} |
|
{'loss': 1.9095, 'grad_norm': 1.0855258703231812, 'learning_rate': 6.800000000000001e-05, 'epoch': 0.54} |
|
{'loss': 1.7703, 'grad_norm': 1.0161635875701904, 'learning_rate': 7.2e-05, 'epoch': 0.58} |
|
{'loss': 2.036, 'grad_norm': 1.0636188983917236, 'learning_rate': 7.6e-05, 'epoch': 0.61} |
|
{'loss': 1.7561, 'grad_norm': 0.9801089763641357, 'learning_rate': 8e-05, 'epoch': 0.64} |
|
{'loss': 2.0666, 'grad_norm': 1.0761668682098389, 'learning_rate': 8.4e-05, 'epoch': 0.67} |
|
{'loss': 1.998, 'grad_norm': 1.021567463874817, 'learning_rate': 8.800000000000001e-05, 'epoch': 0.7} |
|
{'loss': 1.9473, 'grad_norm': 1.181788682937622, 'learning_rate': 9.200000000000001e-05, 'epoch': 0.74} |
|
{'loss': 1.8311, 'grad_norm': 1.0459036827087402, 'learning_rate': 9.6e-05, 'epoch': 0.77} |
|
{'loss': 1.8699, 'grad_norm': 1.1597692966461182, 'learning_rate': 0.0001, 'epoch': 0.8} |
|
{'loss': 1.9534, 'grad_norm': 1.011107325553894, 'learning_rate': 0.00010400000000000001, 'epoch': 0.83} |
|
{'loss': 1.8452, 'grad_norm': 1.150356411933899, 'learning_rate': 0.00010800000000000001, 'epoch': 0.86} |
|
{'loss': 1.7799, 'grad_norm': 0.8848999738693237, 'learning_rate': 0.00011200000000000001, 'epoch': 0.9} |
|
{'loss': 2.0311, 'grad_norm': 0.9910612106323242, 'learning_rate': 0.000116, 'epoch': 0.93} |
|
{'loss': 2.0075, 'grad_norm': 0.9026427268981934, 'learning_rate': 0.00012, 'epoch': 0.96} |
|
{'loss': 1.9488, 'grad_norm': 1.0984656810760498, 'learning_rate': 0.000124, 'epoch': 0.99} |
|
{'loss': 1.5745, 'grad_norm': 0.9381129145622253, 'learning_rate': 0.00012800000000000002, 'epoch': 1.02} |
|
{'loss': 1.85, 'grad_norm': 1.0392099618911743, 'learning_rate': 0.000132, 'epoch': 1.06} |
|
{'loss': 1.7482, 'grad_norm': 0.9713281989097595, 'learning_rate': 0.00013600000000000003, 'epoch': 1.09} |
|
{'loss': 1.7782, 'grad_norm': 1.0861073732376099, 'learning_rate': 0.00014, 'epoch': 1.12} |
|
{'loss': 1.6458, 'grad_norm': 1.1552996635437012, 'learning_rate': 0.000144, 'epoch': 1.15} |
|
{'loss': 1.7587, 'grad_norm': 1.1112191677093506, 'learning_rate': 0.000148, 'epoch': 1.18} |
|
{'loss': 1.7286, 'grad_norm': 1.0669474601745605, 'learning_rate': 0.000152, 'epoch': 1.22} |
|
{'loss': 1.6335, 'grad_norm': 1.112838864326477, 'learning_rate': 0.00015600000000000002, 'epoch': 1.25} |
|
{'loss': 1.7443, 'grad_norm': 1.290359616279602, 'learning_rate': 0.00016, 'epoch': 1.28} |
|
{'loss': 1.685, 'grad_norm': 1.2206538915634155, 'learning_rate': 0.000164, 'epoch': 1.31} |
|
{'loss': 1.7926, 'grad_norm': 1.3254796266555786, 'learning_rate': 0.000168, 'epoch': 1.34} |
|
{'loss': 1.755, 'grad_norm': 1.2418488264083862, 'learning_rate': 0.000172, 'epoch': 1.38} |
|
{'loss': 1.7929, 'grad_norm': 1.2850477695465088, 'learning_rate': 0.00017600000000000002, 'epoch': 1.41} |
|
{'loss': 1.7932, 'grad_norm': 1.071573257446289, 'learning_rate': 0.00018, 'epoch': 1.44} |
|
{'loss': 1.6373, 'grad_norm': 1.0711599588394165, 'learning_rate': 0.00018400000000000003, 'epoch': 1.47} |
|
{'loss': 1.7126, 'grad_norm': 1.2232842445373535, 'learning_rate': 0.000188, 'epoch': 1.5} |
|
{'loss': 1.6336, 'grad_norm': 1.4911489486694336, 'learning_rate': 0.000192, 'epoch': 1.54} |
|
{'loss': 1.8017, 'grad_norm': 1.656353235244751, 'learning_rate': 0.000196, 'epoch': 1.57} |
|
{'loss': 1.4952, 'grad_norm': 1.2128119468688965, 'learning_rate': 0.0002, 'epoch': 1.6} |
|
{'loss': 1.6128, 'grad_norm': 1.6840529441833496, 'learning_rate': 0.00020400000000000003, 'epoch': 1.63} |
|
{'loss': 1.6163, 'grad_norm': 1.1869924068450928, 'learning_rate': 0.00020800000000000001, 'epoch': 1.66} |
|
{'loss': 1.8243, 'grad_norm': 1.4688136577606201, 'learning_rate': 0.00021200000000000003, 'epoch': 1.7} |
|
{'loss': 1.7872, 'grad_norm': 1.2991029024124146, 'learning_rate': 0.00021600000000000002, 'epoch': 1.73} |
|
{'loss': 1.6017, 'grad_norm': 1.175330638885498, 'learning_rate': 0.00022000000000000003, 'epoch': 1.76} |
|
{'loss': 1.5349, 'grad_norm': 1.3206149339675903, 'learning_rate': 0.00022400000000000002, 'epoch': 1.79} |
|
{'loss': 1.8715, 'grad_norm': 1.5922601222991943, 'learning_rate': 0.00022799999999999999, 'epoch': 1.82} |
|
{'loss': 1.5494, 'grad_norm': 1.0914632081985474, 'learning_rate': 0.000232, 'epoch': 1.86} |
|
{'loss': 1.6469, 'grad_norm': 1.4890553951263428, 'learning_rate': 0.000236, 'epoch': 1.89} |
|
{'loss': 1.6751, 'grad_norm': 1.3331321477890015, 'learning_rate': 0.00024, 'epoch': 1.92} |
|
{'loss': 1.6698, 'grad_norm': 1.2876231670379639, 'learning_rate': 0.000244, 'epoch': 1.95} |
|
{'loss': 1.7553, 'grad_norm': 1.2508647441864014, 'learning_rate': 0.000248, 'epoch': 1.98} |
|
{'loss': 1.446, 'grad_norm': 1.2895936965942383, 'learning_rate': 0.000252, 'epoch': 2.02} |
|
{'loss': 1.3252, 'grad_norm': 1.613129734992981, 'learning_rate': 0.00025600000000000004, 'epoch': 2.05} |
|
{'loss': 1.3735, 'grad_norm': 1.6398696899414062, 'learning_rate': 0.00026000000000000003, 'epoch': 2.08} |
|
{'loss': 1.3687, 'grad_norm': 2.7621169090270996, 'learning_rate': 0.000264, 'epoch': 2.11} |
|
{'loss': 1.2094, 'grad_norm': 2.211878776550293, 'learning_rate': 0.000268, 'epoch': 2.14} |
|
{'loss': 1.1334, 'grad_norm': 2.0620462894439697, 'learning_rate': 0.00027200000000000005, 'epoch': 2.18} |
|
{'loss': 1.303, 'grad_norm': 1.3431586027145386, 'learning_rate': 0.000276, 'epoch': 2.21} |
|
{'loss': 1.3252, 'grad_norm': 1.6202877759933472, 'learning_rate': 0.00028, 'epoch': 2.24} |
|
{'loss': 1.3933, 'grad_norm': 1.7303025722503662, 'learning_rate': 0.000284, 'epoch': 2.27} |
|
{'loss': 1.0491, 'grad_norm': 2.3311054706573486, 'learning_rate': 0.000288, 'epoch': 2.3} |
|
{'loss': 1.2975, 'grad_norm': 1.7343841791152954, 'learning_rate': 0.000292, 'epoch': 2.34} |
|
{'loss': 1.2957, 'grad_norm': 1.8763666152954102, 'learning_rate': 0.000296, 'epoch': 2.37} |
|
{'loss': 1.4037, 'grad_norm': 1.9735256433486938, 'learning_rate': 0.00030000000000000003, 'epoch': 2.4} |
|
{'loss': 1.3215, 'grad_norm': 1.5469818115234375, 'learning_rate': 0.000304, 'epoch': 2.43} |
|
{'loss': 1.2443, 'grad_norm': 2.0860166549682617, 'learning_rate': 0.000308, 'epoch': 2.46} |
|
{'loss': 1.132, 'grad_norm': 2.2852907180786133, 'learning_rate': 0.00031200000000000005, 'epoch': 2.5} |
|
{'loss': 1.2549, 'grad_norm': 1.9688935279846191, 'learning_rate': 0.00031600000000000004, 'epoch': 2.53} |
|
{'loss': 1.3261, 'grad_norm': 2.594299554824829, 'learning_rate': 0.00032, 'epoch': 2.56} |
|
{'loss': 1.2184, 'grad_norm': 1.57924485206604, 'learning_rate': 0.000324, 'epoch': 2.59} |
|
{'loss': 1.2074, 'grad_norm': 2.103297472000122, 'learning_rate': 0.000328, 'epoch': 2.62} |
|
{'loss': 1.254, 'grad_norm': 2.1341936588287354, 'learning_rate': 0.000332, 'epoch': 2.66} |
|
{'loss': 1.4054, 'grad_norm': 1.9540143013000488, 'learning_rate': 0.000336, 'epoch': 2.69} |
|
27%|βββ | 85/310 [37:30<1:44:59, 28.00s/it]
28%|βββ | 86/310 [37:55<1:40:55, 27.03s/it]
28%|βββ | 86/310 [37:55<1:40:55, 27.03s/it]
28%|βββ | 87/310 [38:27<1:46:18, 28.60s/it]
28%|βββ | 87/310 [38:27<1:46:18, 28.60s/it]
28%|βββ | 88/310 [38:58<1:48:29, 29.32s/it]
28%|βββ | 88/310 [38:58<1:48:29, 29.32s/it]
29%|βββ | 89/310 [39:17<1:36:52, 26.30s/it]
29%|βββ | 89/310 [39:17<1:36:52, 26.30s/it]
29%|βββ | 90/310 [39:43<1:36:08, 26.22s/it]
29%|βββ | 90/310 [39:43<1:36:08, 26.22s/it]
29%|βββ | 91/310 [40:18<1:45:04, 28.79s/it]
29%|βββ | 91/310 [40:18<1:45:04, 28.79s/it]
30%|βββ | 92/310 [40:41<1:38:43, 27.17s/it]
30%|βββ | 92/310 [40:41<1:38:43, 27.17s/it]
30%|βββ | 93/310 [41:00<1:29:11, 24.66s/it]
30%|βββ | 93/310 [41:00<1:29:11, 24.66s/it]
30%|βββ | 94/310 [41:26<1:29:49, 24.95s/it]
30%|βββ | 94/310 [41:26<1:29:49, 24.95s/it]
31%|βββ | 95/310 [41:47<1:25:14, 23.79s/it]
31%|βββ | 95/310 [41:47<1:25:14, 23.79s/it]
31%|βββ | 96/310 [42:18<1:32:16, 25.87s/it]
31%|βββ | 96/310 [42:18<1:32:16, 25.87s/it]
31%|ββββ | 97/310 [42:48<1:36:55, 27.30s/it]
31%|ββββ | 97/310 [42:48<1:36:55, 27.30s/it]
32%|ββββ | 98/310 [43:18<1:38:39, 27.92s/it]
32%|ββββ | 98/310 [43:18<1:38:39, 27.92s/it]
32%|ββββ | 99/310 [43:47<1:39:13, 28.22s/it]
32%|ββββ | 99/310 [43:47<1:39:13, 28.22s/it]
32%|ββββ | 100/310 [44:17<1:41:32, 29.01s/it]
32%|ββββ | 100/310 [44:17<1:41:32, 29.01s/it]/home/aiops/duanky/miniconda3/envs/hiddenlanguage/lib/python3.11/site-packages/torch/utils/checkpoint.py:460: UserWarning: torch.utils.checkpoint: please pass in use_reentrant=True or use_reentrant=False explicitly. The default value of use_reentrant will be updated to be False in the future. To maintain current behavior, pass use_reentrant=True. It is recommended that you use use_reentrant=False. Refer to docs for more details on the differences between the two variants. |
|
warnings.warn( |
|
33%|ββββ | 101/310 [44:50<1:44:40, 30.05s/it]
33%|ββββ | 101/310 [44:50<1:44:40, 30.05s/it]
33%|ββββ | 102/310 [45:15<1:39:13, 28.62s/it]
33%|ββββ | 102/310 [45:15<1:39:13, 28.62s/it]
33%|ββββ | 103/310 [45:40<1:34:40, 27.44s/it]
33%|ββββ | 103/310 [45:40<1:34:40, 27.44s/it]
34%|ββββ | 104/310 [46:11<1:37:43, 28.46s/it]
34%|ββββ | 104/310 [46:11<1:37:43, 28.46s/it]
34%|ββββ | 105/310 [46:31<1:28:43, 25.97s/it]
34%|ββββ | 105/310 [46:31<1:28:43, 25.97s/it]
34%|ββββ | 106/310 [46:55<1:26:17, 25.38s/it]
34%|ββββ | 106/310 [46:55<1:26:17, 25.38s/it]
35%|ββββ | 107/310 [47:16<1:21:52, 24.20s/it]
35%|ββββ | 107/310 [47:16<1:21:52, 24.20s/it]
35%|ββββ | 108/310 [47:45<1:26:03, 25.56s/it]
35%|ββββ | 108/310 [47:45<1:26:03, 25.56s/it]
35%|ββββ | 109/310 [48:09<1:23:40, 24.98s/it]
35%|ββββ | 109/310 [48:09<1:23:40, 24.98s/it]
35%|ββββ | 110/310 [48:39<1:28:29, 26.55s/it]
35%|ββββ | 110/310 [48:39<1:28:29, 26.55s/it]
36%|ββββ | 111/310 [48:59<1:21:31, 24.58s/it]
36%|ββββ | 111/310 [48:59<1:21:31, 24.58s/it]
36%|ββββ | 112/310 [49:21<1:18:53, 23.91s/it]
36%|ββββ | 112/310 [49:21<1:18:53, 23.91s/it]
36%|ββββ | 113/310 [49:41<1:14:53, 22.81s/it]
36%|ββββ | 113/310 [49:41<1:14:53, 22.81s/it]
37%|ββββ | 114/310 [50:04<1:14:38, 22.85s/it]
37%|ββββ | 114/310 [50:04<1:14:38, 22.85s/it]
37%|ββββ | 115/310 [50:30<1:16:49, 23.64s/it]
37%|ββββ | 115/310 [50:30<1:16:49, 23.64s/it]
37%|ββββ | 116/310 [50:59<1:21:57, 25.35s/it]
37%|ββββ | 116/310 [50:59<1:21:57, 25.35s/it]
38%|ββββ | 117/310 [51:23<1:19:55, 24.84s/it]
38%|ββββ | 117/310 [51:23<1:19:55, 24.84s/it]
38%|ββββ | 118/310 [51:51<1:22:55, 25.91s/it]
38%|ββββ | 118/310 [51:51<1:22:55, 25.91s/it]
38%|ββββ | 119/310 [52:18<1:22:49, 26.02s/it]
38%|ββββ | 119/310 [52:18<1:22:49, 26.02s/it]
39%|ββββ | 120/310 [53:01<1:39:19, 31.37s/it]
39%|ββββ | 120/310 [53:01<1:39:19, 31.37s/it]
39%|ββββ | 121/310 [53:26<1:32:35, 29.39s/it]
39%|ββββ | 121/310 [53:26<1:32:35, 29.39s/it]
39%|ββββ | 122/310 [54:00<1:36:03, 30.66s/it]
39%|ββββ | 122/310 [54:00<1:36:03, 30.66s/it]
40%|ββββ | 123/310 [54:19<1:24:34, 27.14s/it]
40%|ββββ | 123/310 [54:19<1:24:34, 27.14s/it]
40%|ββββ | 124/310 [54:36<1:14:54, 24.16s/it]
40%|ββββ | 124/310 [54:36<1:14:54, 24.16s/it]
40%|ββββ | 125/310 [54:57<1:11:41, 23.25s/it]
40%|ββββ | 125/310 [54:57<1:11:41, 23.25s/it]
41%|ββββ | 126/310 [55:24<1:15:05, 24.49s/it]
41%|ββββ | 126/310 [55:24<1:15:05, 24.49s/it]
41%|ββββ | 127/310 [55:52<1:17:04, 25.27s/it]
41%|ββββ | 127/310 [55:52<1:17:04, 25.27s/it]
41%|βββββ | 128/310 [56:23<1:22:32, 27.21s/it]
41%|βββββ | 128/310 [56:23<1:22:32, 27.21s/it]
42%|βββββ | 129/310 [56:47<1:18:43, 26.09s/it]
42%|βββββ | 129/310 [56:47<1:18:43, 26.09s/it]
42%|βββββ | 130/310 [57:11<1:16:47, 25.60s/it]
42%|βββββ | 130/310 [57:11<1:16:47, 25.60s/it]
42%|βββββ | 131/310 [57:33<1:12:49, 24.41s/it]
42%|βββββ | 131/310 [57:33<1:12:49, 24.41s/it]
43%|βββββ | 132/310 [57:59<1:13:49, 24.89s/it]
43%|βββββ | 132/310 [57:59<1:13:49, 24.89s/it]
43%|βββββ | 133/310 [58:26<1:15:38, 25.64s/it]
43%|βββββ | 133/310 [58:26<1:15:38, 25.64s/it]
43%|βββββ | 134/310 [58:52<1:15:18, 25.67s/it]
43%|βββββ | 134/310 [58:52<1:15:18, 25.67s/it]
44%|βββββ | 135/310 [59:19<1:16:25, 26.20s/it]
44%|βββββ | 135/310 [59:19<1:16:25, 26.20s/it]
44%|βββββ | 136/310 [59:42<1:12:58, 25.16s/it]
44%|βββββ | 136/310 [59:42<1:12:58, 25.16s/it]
44%|βββββ | 137/310 [1:00:19<1:22:52, 28.74s/it]
44%|βββββ | 137/310 [1:00:19<1:22:52, 28.74s/it]
45%|βββββ | 138/310 [1:00:39<1:14:15, 25.91s/it]
45%|βββββ | 138/310 [1:00:39<1:14:15, 25.91s/it]
45%|βββββ | 139/310 [1:01:08<1:17:02, 27.03s/it]
45%|βββββ | 139/310 [1:01:08<1:17:02, 27.03s/it]
45%|βββββ | 140/310 [1:01:35<1:16:44, 27.09s/it]
45%|βββββ | 140/310 [1:01:35<1:16:44, 27.09s/it]
45%|βββββ | 141/310 [1:02:11<1:23:23, 29.61s/it]
45%|βββββ | 141/310 [1:02:11<1:23:23, 29.61s/it]
46%|βββββ | 142/310 [1:02:35<1:17:53, 27.82s/it]
46%|βββββ | 142/310 [1:02:35<1:17:53, 27.82s/it]
46%|βββββ | 143/310 [1:02:54<1:10:02, 25.17s/it]
46%|βββββ | 143/310 [1:02:54<1:10:02, 25.17s/it]
46%|βββββ | 144/310 [1:03:15<1:06:20, 23.98s/it]
46%|βββββ | 144/310 [1:03:15<1:06:20, 23.98s/it]
47%|βββββ | 145/310 [1:03:42<1:08:29, 24.91s/it]
47%|βββββ | 145/310 [1:03:42<1:08:29, 24.91s/it]
47%|βββββ | 146/310 [1:04:05<1:06:23, 24.29s/it]
47%|βββββ | 146/310 [1:04:05<1:06:23, 24.29s/it]
47%|βββββ | 147/310 [1:04:35<1:10:49, 26.07s/it]
47%|βββββ | 147/310 [1:04:35<1:10:49, 26.07s/it]
48%|βββββ | 148/310 [1:05:06<1:14:47, 27.70s/it]
48%|βββββ | 148/310 [1:05:06<1:14:47, 27.70s/it]
48%|βββββ | 149/310 [1:05:28<1:09:33, 25.92s/it]
48%|βββββ | 149/310 [1:05:28<1:09:33, 25.92s/it]
48%|βββββ | 150/310 [1:05:49<1:04:53, 24.33s/it]
48%|βββββ | 150/310 [1:05:49<1:04:53, 24.33s/it]
49%|βββββ | 151/310 [1:06:22<1:11:09, 26.85s/it]
49%|βββββ | 151/310 [1:06:22<1:11:09, 26.85s/it]
49%|βββββ | 152/310 [1:06:45<1:08:21, 25.96s/it]
49%|βββββ | 152/310 [1:06:45<1:08:21, 25.96s/it]
49%|βββββ | 153/310 [1:07:10<1:07:07, 25.65s/it]
49%|βββββ | 153/310 [1:07:10<1:07:07, 25.65s/it]
50%|βββββ | 154/310 [1:07:36<1:06:56, 25.75s/it]
50%|βββββ | 154/310 [1:07:36<1:06:56, 25.75s/it]
50%|βββββ | 155/310 [1:07:56<1:01:56, 23.98s/it]
50%|βββββ | 155/310 [1:07:56<1:01:56, 23.98s/it]
50%|βββββ | 156/310 [1:08:25<1:04:57, 25.31s/it]
50%|βββββ | 156/310 [1:08:25<1:04:57, 25.31s/it]
51%|βββββ | 157/310 [1:08:46<1:01:36, 24.16s/it]
51%|βββββ | 157/310 [1:08:46<1:01:36, 24.16s/it]
51%|βββββ | 158/310 [1:09:12<1:02:44, 24.77s/it]
51%|βββββ | 158/310 [1:09:12<1:02:44, 24.77s/it]
51%|ββββββ | 159/310 [1:09:39<1:04:02, 25.45s/it]
51%|ββββββ | 159/310 [1:09:39<1:04:02, 25.45s/it]
52%|ββββββ | 160/310 [1:10:09<1:07:01, 26.81s/it]
52%|ββββββ | 160/310 [1:10:09<1:07:01, 26.81s/it]
52%|ββββββ | 161/310 [1:10:34<1:04:49, 26.10s/it]
52%|ββββββ | 161/310 [1:10:34<1:04:49, 26.10s/it]
52%|ββββββ | 162/310 [1:10:58<1:03:18, 25.67s/it]
52%|ββββββ | 162/310 [1:10:58<1:03:18, 25.67s/it]
53%|ββββββ | 163/310 [1:11:29<1:06:49, 27.27s/it]
53%|ββββββ | 163/310 [1:11:29<1:06:49, 27.27s/it]
53%|ββββββ | 164/310 [1:12:02<1:10:16, 28.88s/it]
{'loss': 1.2457, 'grad_norm': 1.5419526100158691, 'learning_rate': 0.00034, 'epoch': 2.72} |
|
{'loss': 1.3599, 'grad_norm': 1.755259394645691, 'learning_rate': 0.000344, 'epoch': 2.75} |
|
{'loss': 1.3254, 'grad_norm': 1.613865613937378, 'learning_rate': 0.000348, 'epoch': 2.78} |
|
{'loss': 1.4252, 'grad_norm': 1.7179518938064575, 'learning_rate': 0.00035200000000000005, 'epoch': 2.82} |
|
{'loss': 1.2615, 'grad_norm': 2.2753326892852783, 'learning_rate': 0.00035600000000000003, 'epoch': 2.85} |
|
{'loss': 1.241, 'grad_norm': 1.842179536819458, 'learning_rate': 0.00036, 'epoch': 2.88} |
|
{'loss': 1.2639, 'grad_norm': 1.5134506225585938, 'learning_rate': 0.000364, 'epoch': 2.91} |
|
{'loss': 1.3813, 'grad_norm': 1.7905986309051514, 'learning_rate': 0.00036800000000000005, 'epoch': 2.94} |
|
{'loss': 1.0662, 'grad_norm': 1.8773596286773682, 'learning_rate': 0.00037200000000000004, 'epoch': 2.98} |
|
{'loss': 1.0737, 'grad_norm': 1.5927729606628418, 'learning_rate': 0.000376, 'epoch': 3.01} |
|
{'loss': 0.8692, 'grad_norm': 2.322094678878784, 'learning_rate': 0.00038, 'epoch': 3.04} |
|
{'loss': 0.8562, 'grad_norm': 2.568728446960449, 'learning_rate': 0.000384, 'epoch': 3.07} |
|
{'loss': 0.8724, 'grad_norm': 3.8487601280212402, 'learning_rate': 0.000388, 'epoch': 3.1} |
|
{'loss': 0.9551, 'grad_norm': 4.1459059715271, 'learning_rate': 0.000392, 'epoch': 3.14} |
|
{'loss': 0.889, 'grad_norm': 2.5314478874206543, 'learning_rate': 0.00039600000000000003, 'epoch': 3.17} |
|
{'loss': 0.8369, 'grad_norm': 2.2340171337127686, 'learning_rate': 0.0004, 'epoch': 3.2} |
|
{'loss': 0.9581, 'grad_norm': 2.0403265953063965, 'learning_rate': 0.00039997762036205473, 'epoch': 3.23} |
|
{'loss': 0.8314, 'grad_norm': 1.9364635944366455, 'learning_rate': 0.00039991048645670067, 'epoch': 3.26} |
|
{'loss': 0.7546, 'grad_norm': 2.1579976081848145, 'learning_rate': 0.00039979861330826294, 'epoch': 3.3} |
|
{'loss': 0.9047, 'grad_norm': 2.45918345451355, 'learning_rate': 0.000399642025953547, 'epoch': 3.33} |
|
{'loss': 0.8436, 'grad_norm': 3.3697502613067627, 'learning_rate': 0.00039944075943623605, 'epoch': 3.36} |
|
{'loss': 0.8106, 'grad_norm': 2.423370838165283, 'learning_rate': 0.00039919485879904784, 'epoch': 3.39} |
|
{'loss': 0.8619, 'grad_norm': 2.6539015769958496, 'learning_rate': 0.0003989043790736547, 'epoch': 3.42} |
|
{'loss': 0.9622, 'grad_norm': 2.5989012718200684, 'learning_rate': 0.0003985693852683675, 'epoch': 3.46} |
|
{'loss': 0.9281, 'grad_norm': 2.486663579940796, 'learning_rate': 0.00039818995235358696, 'epoch': 3.49} |
|
{'loss': 0.9238, 'grad_norm': 2.2259626388549805, 'learning_rate': 0.0003977661652450257, 'epoch': 3.52} |
|
{'loss': 0.8284, 'grad_norm': 2.4912493228912354, 'learning_rate': 0.00039729811878470427, 'epoch': 3.55} |
|
{'loss': 0.8886, 'grad_norm': 3.591897964477539, 'learning_rate': 0.0003967859177197259, 'epoch': 3.58} |
|
{'loss': 0.8223, 'grad_norm': 2.817984104156494, 'learning_rate': 0.00039622967667883455, 'epoch': 3.62} |
|
{'loss': 0.8662, 'grad_norm': 2.7852816581726074, 'learning_rate': 0.00039562952014676116, 'epoch': 3.65} |
|
{'loss': 0.9307, 'grad_norm': 2.7522706985473633, 'learning_rate': 0.0003949855824363647, 'epoch': 3.68} |
|
{'loss': 0.9421, 'grad_norm': 2.4207229614257812, 'learning_rate': 0.0003942980076585735, 'epoch': 3.71} |
|
{'loss': 0.7998, 'grad_norm': 2.437062978744507, 'learning_rate': 0.00039356694969013337, 'epoch': 3.74} |
|
{'loss': 0.9184, 'grad_norm': 2.865618944168091, 'learning_rate': 0.00039279257213917066, 'epoch': 3.78} |
|
{'loss': 0.9429, 'grad_norm': 2.6722922325134277, 'learning_rate': 0.0003919750483085778, 'epoch': 3.81} |
|
{'loss': 1.1087, 'grad_norm': 1.9785279035568237, 'learning_rate': 0.0003911145611572282, 'epoch': 3.84} |
|
{'loss': 0.9689, 'grad_norm': 2.205739736557007, 'learning_rate': 0.00039021130325903074, 'epoch': 3.87} |
|
{'loss': 1.0234, 'grad_norm': 2.2968075275421143, 'learning_rate': 0.00038926547675983286, 'epoch': 3.9} |
|
{'loss': 0.9101, 'grad_norm': 2.625220775604248, 'learning_rate': 0.00038827729333218067, 'epoch': 3.94} |
|
{'loss': 0.8091, 'grad_norm': 3.1852171421051025, 'learning_rate': 0.00038724697412794747, 'epoch': 3.97} |
|
{'loss': 0.877, 'grad_norm': 2.5322062969207764, 'learning_rate': 0.0003861747497288409, 'epoch': 4.0} |
|
{'loss': 0.5778, 'grad_norm': 2.009476900100708, 'learning_rate': 0.00038506086009479937, 'epoch': 4.03} |
|
{'loss': 0.5761, 'grad_norm': 2.336458683013916, 'learning_rate': 0.0003839055545102902, 'epoch': 4.06} |
|
{'loss': 0.7251, 'grad_norm': 3.4863994121551514, 'learning_rate': 0.0003827090915285202, 'epoch': 4.1} |
|
{'loss': 0.5691, 'grad_norm': 3.6318466663360596, 'learning_rate': 0.000381471738913573, 'epoch': 4.13} |
|
{'loss': 0.5109, 'grad_norm': 3.195667028427124, 'learning_rate': 0.0003801937735804838, 'epoch': 4.16} |
|
{'loss': 0.5271, 'grad_norm': 2.632443428039551, 'learning_rate': 0.0003788754815332674, 'epoch': 4.19} |
|
{'loss': 0.5436, 'grad_norm': 2.512230396270752, 'learning_rate': 0.00037751715780091086, 'epoch': 4.22} |
|
{'loss': 0.6281, 'grad_norm': 2.6098170280456543, 'learning_rate': 0.0003761191063713476, 'epoch': 4.26} |
|
{'loss': 0.5592, 'grad_norm': 2.866654872894287, 'learning_rate': 0.00037468164012342597, 'epoch': 4.29} |
|
{'loss': 0.6101, 'grad_norm': 2.6089141368865967, 'learning_rate': 0.00037320508075688776, 'epoch': 4.32} |
|
{'loss': 0.5595, 'grad_norm': 3.013399362564087, 'learning_rate': 0.00037168975872037323, 'epoch': 4.35} |
|
{'loss': 0.7711, 'grad_norm': 2.2104837894439697, 'learning_rate': 0.00037013601313746797, 'epoch': 4.38} |
|
{'loss': 0.5617, 'grad_norm': 2.6637637615203857, 'learning_rate': 0.0003685441917308078, 'epoch': 4.42} |
|
{'loss': 0.7593, 'grad_norm': 2.549792528152466, 'learning_rate': 0.00036691465074426054, 'epoch': 4.45} |
|
{'loss': 0.6126, 'grad_norm': 2.5195631980895996, 'learning_rate': 0.000365247754863199, 'epoch': 4.48} |
|
{'loss': 0.8219, 'grad_norm': 2.139824390411377, 'learning_rate': 0.0003635438771328863, 'epoch': 4.51} |
|
{'loss': 0.6161, 'grad_norm': 2.703467845916748, 'learning_rate': 0.0003618033988749895, 'epoch': 4.54} |
|
{'loss': 0.6019, 'grad_norm': 2.891723871231079, 'learning_rate': 0.0003600267096022413, 'epoch': 4.58} |
|
{'loss': 0.6058, 'grad_norm': 2.645942449569702, 'learning_rate': 0.0003582142069312683, 'epoch': 4.61} |
|
{'loss': 0.6192, 'grad_norm': 2.4883062839508057, 'learning_rate': 0.000356366296493606, 'epoch': 4.64} |
|
{'loss': 0.7085, 'grad_norm': 2.9119791984558105, 'learning_rate': 0.0003544833918449199, 'epoch': 4.67} |
|
{'loss': 0.7673, 'grad_norm': 2.8716561794281006, 'learning_rate': 0.0003525659143724533, 'epoch': 4.7} |
|
{'loss': 0.7615, 'grad_norm': 2.499471664428711, 'learning_rate': 0.00035061429320072223, 'epoch': 4.74} |
|
{'loss': 0.7292, 'grad_norm': 3.0349347591400146, 'learning_rate': 0.00034862896509547887, 'epoch': 4.77} |
|
{'loss': 0.5818, 'grad_norm': 2.8034589290618896, 'learning_rate': 0.0003466103743659653, 'epoch': 4.8} |
|
{'loss': 0.8276, 'grad_norm': 2.9072346687316895, 'learning_rate': 0.0003445589727654783, 'epoch': 4.83} |
|
{'loss': 0.6848, 'grad_norm': 2.4491543769836426, 'learning_rate': 0.000342475219390269, 'epoch': 4.86} |
|
{'loss': 0.7028, 'grad_norm': 2.8916873931884766, 'learning_rate': 0.0003403595805767983, 'epoch': 4.9} |
|
{'loss': 0.7743, 'grad_norm': 2.994807243347168, 'learning_rate': 0.00033821252979737297, 'epoch': 4.93} |
|
{'loss': 0.6115, 'grad_norm': 3.1947031021118164, 'learning_rate': 0.0003360345475541839, 'epoch': 4.96} |
|
{'loss': 0.6807, 'grad_norm': 2.523216485977173, 'learning_rate': 0.00033382612127177166, 'epoch': 4.99} |
|
{'loss': 0.3461, 'grad_norm': 1.9927023649215698, 'learning_rate': 0.00033158774518794254, 'epoch': 5.02} |
|
{'loss': 0.3726, 'grad_norm': 2.2113711833953857, 'learning_rate': 0.0003293199202431599, 'epoch': 5.06} |
|
{'loss': 0.3981, 'grad_norm': 2.318103313446045, 'learning_rate': 0.0003270231539684358, 'epoch': 5.09} |
|
{'loss': 0.4989, 'grad_norm': 2.4667224884033203, 'learning_rate': 0.00032469796037174674, 'epoch': 5.12} |
|
{'loss': 0.394, 'grad_norm': 2.646738052368164, 'learning_rate': 0.0003223448598230013, 'epoch': 5.15} |
|
{'loss': 0.4174, 'grad_norm': 3.2133796215057373, 'learning_rate': 0.00031996437893758276, 'epoch': 5.18} |
|
{'loss': 0.5128, 'grad_norm': 3.122089147567749, 'learning_rate': 0.00031755705045849464, 'epoch': 5.22} |
|
53%|ββββββ | 164/310 [1:12:02<1:10:16, 28.88s/it]
53%|ββββββ | 165/310 [1:12:25<1:05:44, 27.20s/it]
53%|ββββββ | 165/310 [1:12:25<1:05:44, 27.20s/it]
54%|ββββββ | 166/310 [1:12:58<1:09:19, 28.89s/it]
54%|ββββββ | 166/310 [1:12:58<1:09:19, 28.89s/it]
54%|ββββββ | 167/310 [1:13:22<1:04:57, 27.25s/it]
54%|ββββββ | 167/310 [1:13:22<1:04:57, 27.25s/it]
54%|ββββββ | 168/310 [1:13:45<1:01:50, 26.13s/it]
54%|ββββββ | 168/310 [1:13:45<1:01:50, 26.13s/it]
55%|ββββββ | 169/310 [1:14:19<1:06:47, 28.42s/it]
55%|ββββββ | 169/310 [1:14:19<1:06:47, 28.42s/it]
55%|ββββββ | 170/310 [1:14:45<1:04:34, 27.67s/it]
55%|ββββββ | 170/310 [1:14:45<1:04:34, 27.67s/it]
55%|ββββββ | 171/310 [1:15:11<1:02:50, 27.12s/it]
55%|ββββββ | 171/310 [1:15:11<1:02:50, 27.12s/it]
55%|ββββββ | 172/310 [1:15:46<1:08:12, 29.66s/it]
55%|ββββββ | 172/310 [1:15:46<1:08:12, 29.66s/it]
56%|ββββββ | 173/310 [1:16:05<1:00:23, 26.45s/it]
56%|ββββββ | 173/310 [1:16:05<1:00:23, 26.45s/it]
56%|ββββββ | 174/310 [1:16:28<57:35, 25.40s/it]
56%|ββββββ | 174/310 [1:16:28<57:35, 25.40s/it]
56%|ββββββ | 175/310 [1:16:55<58:10, 25.85s/it]
56%|ββββββ | 175/310 [1:16:55<58:10, 25.85s/it]
57%|ββββββ | 176/310 [1:17:19<56:41, 25.38s/it]
57%|ββββββ | 176/310 [1:17:19<56:41, 25.38s/it]
57%|ββββββ | 177/310 [1:17:44<55:36, 25.09s/it]
57%|ββββββ | 177/310 [1:17:44<55:36, 25.09s/it]
57%|ββββββ | 178/310 [1:18:10<55:51, 25.39s/it]
57%|ββββββ | 178/310 [1:18:10<55:51, 25.39s/it]
58%|ββββββ | 179/310 [1:18:29<51:38, 23.65s/it]
58%|ββββββ | 179/310 [1:18:29<51:38, 23.65s/it]
58%|ββββββ | 180/310 [1:18:51<50:09, 23.15s/it]
58%|ββββββ | 180/310 [1:18:51<50:09, 23.15s/it]
58%|ββββββ | 181/310 [1:19:18<51:41, 24.04s/it]
58%|ββββββ | 181/310 [1:19:18<51:41, 24.04s/it]
59%|ββββββ | 182/310 [1:19:47<55:00, 25.78s/it]
59%|ββββββ | 182/310 [1:19:47<55:00, 25.78s/it]
59%|ββββββ | 183/310 [1:20:12<53:40, 25.36s/it]
59%|ββββββ | 183/310 [1:20:12<53:40, 25.36s/it]
59%|ββββββ | 184/310 [1:20:40<54:49, 26.11s/it]
59%|ββββββ | 184/310 [1:20:40<54:49, 26.11s/it]
60%|ββββββ | 185/310 [1:21:04<53:08, 25.51s/it]
60%|ββββββ | 185/310 [1:21:04<53:08, 25.51s/it]
60%|ββββββ | 186/310 [1:21:30<53:13, 25.75s/it]
60%|ββββββ | 186/310 [1:21:30<53:13, 25.75s/it]
60%|ββββββ | 187/310 [1:22:00<55:17, 26.97s/it]
60%|ββββββ | 187/310 [1:22:00<55:17, 26.97s/it]
61%|ββββββ | 188/310 [1:22:27<54:58, 27.04s/it]
61%|ββββββ | 188/310 [1:22:27<54:58, 27.04s/it]
61%|ββββββ | 189/310 [1:22:53<53:47, 26.67s/it]
61%|ββββββ | 189/310 [1:22:53<53:47, 26.67s/it]
61%|βββββββ | 190/310 [1:23:24<55:47, 27.89s/it]
61%|βββββββ | 190/310 [1:23:24<55:47, 27.89s/it]
62%|βββββββ | 191/310 [1:23:47<52:25, 26.43s/it]
62%|βββββββ | 191/310 [1:23:47<52:25, 26.43s/it]
62%|βββββββ | 192/310 [1:24:13<52:10, 26.53s/it]
62%|βββββββ | 192/310 [1:24:13<52:10, 26.53s/it]
62%|βββββββ | 193/310 [1:24:39<51:25, 26.37s/it]
62%|βββββββ | 193/310 [1:24:39<51:25, 26.37s/it]
63%|βββββββ | 194/310 [1:25:03<49:07, 25.41s/it]
63%|βββββββ | 194/310 [1:25:03<49:07, 25.41s/it]
63%|βββββββ | 195/310 [1:25:23<45:39, 23.82s/it]
63%|βββββββ | 195/310 [1:25:23<45:39, 23.82s/it]
63%|βββββββ | 196/310 [1:25:56<50:40, 26.67s/it]
63%|βββββββ | 196/310 [1:25:56<50:40, 26.67s/it]
64%|βββββββ | 197/310 [1:26:22<49:55, 26.51s/it]
64%|βββββββ | 197/310 [1:26:22<49:55, 26.51s/it]
64%|βββββββ | 198/310 [1:26:54<52:39, 28.21s/it]
64%|βββββββ | 198/310 [1:26:54<52:39, 28.21s/it]
64%|βββββββ | 199/310 [1:27:22<51:38, 27.91s/it]
64%|βββββββ | 199/310 [1:27:22<51:38, 27.91s/it]
65%|βββββββ | 200/310 [1:27:48<50:19, 27.45s/it]
65%|βββββββ | 200/310 [1:27:48<50:19, 27.45s/it]/home/aiops/duanky/miniconda3/envs/hiddenlanguage/lib/python3.11/site-packages/torch/utils/checkpoint.py:460: UserWarning: torch.utils.checkpoint: please pass in use_reentrant=True or use_reentrant=False explicitly. The default value of use_reentrant will be updated to be False in the future. To maintain current behavior, pass use_reentrant=True. It is recommended that you use use_reentrant=False. Refer to docs for more details on the differences between the two variants. |
|
warnings.warn( |
|
65%|βββββββ | 201/310 [1:28:20<52:18, 28.80s/it]
65%|βββββββ | 201/310 [1:28:20<52:18, 28.80s/it]
65%|βββββββ | 202/310 [1:28:54<54:55, 30.51s/it]
65%|βββββββ | 202/310 [1:28:54<54:55, 30.51s/it]
65%|βββββββ | 203/310 [1:29:19<51:27, 28.86s/it]
65%|βββββββ | 203/310 [1:29:19<51:27, 28.86s/it]
66%|βββββββ | 204/310 [1:29:44<48:51, 27.66s/it]
66%|βββββββ | 204/310 [1:29:44<48:51, 27.66s/it]
66%|βββββββ | 205/310 [1:30:03<43:52, 25.07s/it]
66%|βββββββ | 205/310 [1:30:03<43:52, 25.07s/it]
66%|βββββββ | 206/310 [1:30:33<46:02, 26.56s/it]
66%|βββββββ | 206/310 [1:30:33<46:02, 26.56s/it]
67%|βββββββ | 207/310 [1:30:54<42:40, 24.86s/it]
67%|βββββββ | 207/310 [1:30:54<42:40, 24.86s/it]
67%|βββββββ | 208/310 [1:31:19<42:27, 24.98s/it]
67%|βββββββ | 208/310 [1:31:19<42:27, 24.98s/it]
67%|βββββββ | 209/310 [1:31:42<40:36, 24.12s/it]
67%|βββββββ | 209/310 [1:31:42<40:36, 24.12s/it]
68%|βββββββ | 210/310 [1:32:04<39:18, 23.58s/it]
68%|βββββββ | 210/310 [1:32:04<39:18, 23.58s/it]
68%|βββββββ | 211/310 [1:32:33<41:53, 25.39s/it]
68%|βββββββ | 211/310 [1:32:33<41:53, 25.39s/it]
68%|βββββββ | 212/310 [1:32:55<39:35, 24.24s/it]
68%|βββββββ | 212/310 [1:32:55<39:35, 24.24s/it]
69%|βββββββ | 213/310 [1:33:23<41:10, 25.47s/it]
69%|βββββββ | 213/310 [1:33:23<41:10, 25.47s/it]
69%|βββββββ | 214/310 [1:33:44<38:38, 24.15s/it]
69%|βββββββ | 214/310 [1:33:44<38:38, 24.15s/it]
69%|βββββββ | 215/310 [1:34:15<41:21, 26.12s/it]
69%|βββββββ | 215/310 [1:34:15<41:21, 26.12s/it]
70%|βββββββ | 216/310 [1:34:44<42:21, 27.04s/it]
70%|βββββββ | 216/310 [1:34:44<42:21, 27.04s/it]
70%|βββββββ | 217/310 [1:35:19<45:14, 29.19s/it]
70%|βββββββ | 217/310 [1:35:19<45:14, 29.19s/it]
70%|βββββββ | 218/310 [1:35:43<42:47, 27.91s/it]
70%|βββββββ | 218/310 [1:35:43<42:47, 27.91s/it]
71%|βββββββ | 219/310 [1:36:14<43:24, 28.62s/it]
71%|βββββββ | 219/310 [1:36:14<43:24, 28.62s/it]
71%|βββββββ | 220/310 [1:36:41<42:23, 28.26s/it]
71%|βββββββ | 220/310 [1:36:41<42:23, 28.26s/it]
71%|ββββββββ | 221/310 [1:37:07<40:41, 27.43s/it]
71%|ββββββββ | 221/310 [1:37:07<40:41, 27.43s/it]
72%|ββββββββ | 222/310 [1:37:24<35:37, 24.29s/it]
72%|ββββββββ | 222/310 [1:37:24<35:37, 24.29s/it]
72%|ββββββββ | 223/310 [1:37:52<37:06, 25.59s/it]
72%|ββββββββ | 223/310 [1:37:52<37:06, 25.59s/it]
72%|ββββββββ | 224/310 [1:38:17<36:06, 25.19s/it]
72%|ββββββββ | 224/310 [1:38:17<36:06, 25.19s/it]
73%|ββββββββ | 225/310 [1:38:43<36:09, 25.52s/it]
73%|ββββββββ | 225/310 [1:38:43<36:09, 25.52s/it]
73%|ββββββββ | 226/310 [1:39:19<40:25, 28.87s/it]
73%|ββββββββ | 226/310 [1:39:19<40:25, 28.87s/it]
73%|ββββββββ | 227/310 [1:39:50<40:40, 29.41s/it]
73%|ββββββββ | 227/310 [1:39:50<40:40, 29.41s/it]
74%|ββββββββ | 228/310 [1:40:17<39:17, 28.75s/it]
74%|ββββββββ | 228/310 [1:40:17<39:17, 28.75s/it]
74%|ββββββββ | 229/310 [1:40:40<36:09, 26.78s/it]
74%|ββββββββ | 229/310 [1:40:40<36:09, 26.78s/it]
74%|ββββββββ | 230/310 [1:41:07<36:02, 27.03s/it]
74%|ββββββββ | 230/310 [1:41:07<36:02, 27.03s/it]
75%|ββββββββ | 231/310 [1:41:46<40:23, 30.68s/it]
75%|ββββββββ | 231/310 [1:41:46<40:23, 30.68s/it]
75%|ββββββββ | 232/310 [1:42:13<38:25, 29.56s/it]
75%|ββββββββ | 232/310 [1:42:13<38:25, 29.56s/it]
75%|ββββββββ | 233/310 [1:42:43<38:04, 29.67s/it]
75%|ββββββββ | 233/310 [1:42:43<38:04, 29.67s/it]
75%|ββββββββ | 234/310 [1:43:04<34:04, 26.90s/it]
75%|ββββββββ | 234/310 [1:43:04<34:04, 26.90s/it]
76%|ββββββββ | 235/310 [1:43:31<33:49, 27.06s/it]
76%|ββββββββ | 235/310 [1:43:31<33:49, 27.06s/it]
76%|ββββββββ | 236/310 [1:44:01<34:31, 27.99s/it]
76%|ββββββββ | 236/310 [1:44:01<34:31, 27.99s/it]
76%|ββββββββ | 237/310 [1:44:27<33:21, 27.42s/it]
76%|ββββββββ | 237/310 [1:44:27<33:21, 27.42s/it]
77%|ββββββββ | 238/310 [1:44:47<30:01, 25.02s/it]
77%|ββββββββ | 238/310 [1:44:47<30:01, 25.02s/it]
77%|ββββββββ | 239/310 [1:45:13<30:12, 25.53s/it]
77%|ββββββββ | 239/310 [1:45:13<30:12, 25.53s/it]
77%|ββββββββ | 240/310 [1:45:37<29:00, 24.87s/it]
77%|ββββββββ | 240/310 [1:45:37<29:00, 24.87s/it]
78%|ββββββββ | 241/310 [1:46:01<28:26, 24.72s/it]
78%|ββββββββ | 241/310 [1:46:01<28:26, 24.72s/it]
78%|ββββββββ | 242/310 [1:46:29<29:02, 25.63s/it]
{'loss': 0.5152, 'grad_norm': 3.3867886066436768, 'learning_rate': 0.00031512341313713474, 'epoch': 5.25} |
|
{'loss': 0.3792, 'grad_norm': 2.431579351425171, 'learning_rate': 0.0003126640116127244, 'epoch': 5.28} |
|
{'loss': 0.4987, 'grad_norm': 2.0915236473083496, 'learning_rate': 0.0003101793962904205, 'epoch': 5.31} |
|
{'loss': 0.4169, 'grad_norm': 2.8675644397735596, 'learning_rate': 0.0003076701232181365, 'epoch': 5.34} |
|
{'loss': 0.429, 'grad_norm': 2.8221242427825928, 'learning_rate': 0.00030513675396210094, 'epoch': 5.38} |
|
{'loss': 0.5472, 'grad_norm': 2.428004264831543, 'learning_rate': 0.00030257985548118126, 'epoch': 5.41} |
|
{'loss': 0.4324, 'grad_norm': 2.6007370948791504, 'learning_rate': 0.00030000000000000003, 'epoch': 5.44} |
|
{'loss': 0.5068, 'grad_norm': 3.2577507495880127, 'learning_rate': 0.00029739776488087345, 'epoch': 5.47} |
|
{'loss': 0.5063, 'grad_norm': 2.3559985160827637, 'learning_rate': 0.0002947737324945997, 'epoch': 5.5} |
|
{'loss': 0.373, 'grad_norm': 2.5560073852539062, 'learning_rate': 0.0002921284900901265, 'epoch': 5.54} |
|
{'loss': 0.4752, 'grad_norm': 2.8501780033111572, 'learning_rate': 0.00028946262966312653, 'epoch': 5.57} |
|
{'loss': 0.473, 'grad_norm': 2.443272352218628, 'learning_rate': 0.00028677674782351165, 'epoch': 5.6} |
|
{'loss': 0.4048, 'grad_norm': 3.037306070327759, 'learning_rate': 0.00028407144566191313, 'epoch': 5.63} |
|
{'loss': 0.3663, 'grad_norm': 2.3580427169799805, 'learning_rate': 0.0002813473286151601, 'epoch': 5.66} |
|
{'loss': 0.4945, 'grad_norm': 3.0282719135284424, 'learning_rate': 0.00027860500633078477, 'epoch': 5.7} |
|
{'loss': 0.382, 'grad_norm': 2.7548229694366455, 'learning_rate': 0.0002758450925305857, 'epoch': 5.73} |
|
{'loss': 0.417, 'grad_norm': 2.615589141845703, 'learning_rate': 0.00027306820487327905, 'epoch': 5.76} |
|
{'loss': 0.4628, 'grad_norm': 3.054159641265869, 'learning_rate': 0.0002702749648162686, 'epoch': 5.79} |
|
{'loss': 0.5489, 'grad_norm': 2.641526699066162, 'learning_rate': 0.00026746599747656607, 'epoch': 5.82} |
|
{'loss': 0.4926, 'grad_norm': 2.968789577484131, 'learning_rate': 0.00026464193149089205, 'epoch': 5.86} |
|
{'loss': 0.4561, 'grad_norm': 2.786257743835449, 'learning_rate': 0.00026180339887498953, 'epoch': 5.89} |
|
{'loss': 0.4068, 'grad_norm': 2.4414594173431396, 'learning_rate': 0.00025895103488218085, 'epoch': 5.92} |
|
{'loss': 0.5115, 'grad_norm': 2.454012870788574, 'learning_rate': 0.00025608547786120056, 'epoch': 5.95} |
|
{'loss': 0.4792, 'grad_norm': 2.3952438831329346, 'learning_rate': 0.00025320736911333503, 'epoch': 5.98} |
|
{'loss': 0.4068, 'grad_norm': 1.9571117162704468, 'learning_rate': 0.0002503173527489017, 'epoch': 6.02} |
|
{'loss': 0.2905, 'grad_norm': 2.0324618816375732, 'learning_rate': 0.00024741607554309953, 'epoch': 6.05} |
|
{'loss': 0.2439, 'grad_norm': 1.9604227542877197, 'learning_rate': 0.0002445041867912629, 'epoch': 6.08} |
|
{'loss': 0.2561, 'grad_norm': 2.384906768798828, 'learning_rate': 0.00024158233816355185, 'epoch': 6.11} |
|
{'loss': 0.1931, 'grad_norm': 2.3552908897399902, 'learning_rate': 0.00023865118355911066, 'epoch': 6.14} |
|
{'loss': 0.212, 'grad_norm': 2.022010087966919, 'learning_rate': 0.00023571137895972733, 'epoch': 6.18} |
|
{'loss': 0.2427, 'grad_norm': 2.2903988361358643, 'learning_rate': 0.00023276358228302755, 'epoch': 6.21} |
|
{'loss': 0.2294, 'grad_norm': 3.257195234298706, 'learning_rate': 0.00022980845323523487, 'epoch': 6.24} |
|
{'loss': 0.3048, 'grad_norm': 2.0269675254821777, 'learning_rate': 0.0002268466531635311, 'epoch': 6.27} |
|
{'loss': 0.2448, 'grad_norm': 2.6209371089935303, 'learning_rate': 0.00022387884490804885, 'epoch': 6.3} |
|
{'loss': 0.2997, 'grad_norm': 1.9519191980361938, 'learning_rate': 0.00022090569265353072, 'epoch': 6.34} |
|
{'loss': 0.3085, 'grad_norm': 2.1031925678253174, 'learning_rate': 0.00021792786178068672, 'epoch': 6.37} |
|
{'loss': 0.2777, 'grad_norm': 2.3735647201538086, 'learning_rate': 0.00021494601871728487, 'epoch': 6.4} |
|
{'loss': 0.3242, 'grad_norm': 2.3519961833953857, 'learning_rate': 0.00021196083078900687, 'epoch': 6.43} |
|
{'loss': 0.3699, 'grad_norm': 2.346907615661621, 'learning_rate': 0.00020897296607010301, 'epoch': 6.46} |
|
{'loss': 0.2652, 'grad_norm': 2.7333357334136963, 'learning_rate': 0.00020598309323387973, 'epoch': 6.5} |
|
{'loss': 0.2778, 'grad_norm': 2.594409227371216, 'learning_rate': 0.00020299188140305275, 'epoch': 6.53} |
|
{'loss': 0.1466, 'grad_norm': 2.1327905654907227, 'learning_rate': 0.0002, 'epoch': 6.56} |
|
{'loss': 0.3872, 'grad_norm': 2.7535364627838135, 'learning_rate': 0.00019700811859694732, 'epoch': 6.59} |
|
{'loss': 0.1841, 'grad_norm': 2.161400079727173, 'learning_rate': 0.00019401690676612037, 'epoch': 6.62} |
|
{'loss': 0.286, 'grad_norm': 2.3778538703918457, 'learning_rate': 0.00019102703392989709, 'epoch': 6.66} |
|
{'loss': 0.2268, 'grad_norm': 2.713294744491577, 'learning_rate': 0.00018803916921099315, 'epoch': 6.69} |
|
{'loss': 0.2228, 'grad_norm': 2.3581082820892334, 'learning_rate': 0.00018505398128271515, 'epoch': 6.72} |
|
{'loss': 0.2938, 'grad_norm': 1.9641681909561157, 'learning_rate': 0.00018207213821931333, 'epoch': 6.75} |
|
{'loss': 0.2232, 'grad_norm': 2.6628777980804443, 'learning_rate': 0.00017909430734646935, 'epoch': 6.78} |
|
{'loss': 0.3172, 'grad_norm': 2.0716280937194824, 'learning_rate': 0.00017612115509195117, 'epoch': 6.82} |
|
{'loss': 0.2279, 'grad_norm': 2.550506830215454, 'learning_rate': 0.00017315334683646897, 'epoch': 6.85} |
|
{'loss': 0.3422, 'grad_norm': 2.055569887161255, 'learning_rate': 0.0001701915467647651, 'epoch': 6.88} |
|
{'loss': 0.2958, 'grad_norm': 2.0579302310943604, 'learning_rate': 0.00016723641771697247, 'epoch': 6.91} |
|
{'loss': 0.3472, 'grad_norm': 2.271643877029419, 'learning_rate': 0.00016428862104027268, 'epoch': 6.94} |
|
{'loss': 0.2379, 'grad_norm': 2.0624279975891113, 'learning_rate': 0.00016134881644088938, 'epoch': 6.98} |
|
{'loss': 0.3121, 'grad_norm': 2.3407957553863525, 'learning_rate': 0.00015841766183644817, 'epoch': 7.01} |
|
{'loss': 0.1415, 'grad_norm': 1.4568068981170654, 'learning_rate': 0.00015549581320873715, 'epoch': 7.04} |
|
{'loss': 0.1288, 'grad_norm': 1.3108962774276733, 'learning_rate': 0.00015258392445690052, 'epoch': 7.07} |
|
{'loss': 0.0594, 'grad_norm': 1.741568684577942, 'learning_rate': 0.0001496826472510983, 'epoch': 7.1} |
|
{'loss': 0.1379, 'grad_norm': 1.848054051399231, 'learning_rate': 0.00014679263088666499, 'epoch': 7.14} |
|
{'loss': 0.1295, 'grad_norm': 1.8237534761428833, 'learning_rate': 0.0001439145221387995, 'epoch': 7.17} |
|
{'loss': 0.1418, 'grad_norm': 1.8942619562149048, 'learning_rate': 0.00014104896511781914, 'epoch': 7.2} |
|
{'loss': 0.2078, 'grad_norm': 1.967505931854248, 'learning_rate': 0.00013819660112501054, 'epoch': 7.23} |
|
{'loss': 0.1312, 'grad_norm': 1.5713609457015991, 'learning_rate': 0.00013535806850910803, 'epoch': 7.26} |
|
{'loss': 0.1201, 'grad_norm': 1.8843345642089844, 'learning_rate': 0.00013253400252343403, 'epoch': 7.3} |
|
{'loss': 0.0846, 'grad_norm': 1.7415904998779297, 'learning_rate': 0.00012972503518373144, 'epoch': 7.33} |
|
{'loss': 0.1257, 'grad_norm': 1.541115403175354, 'learning_rate': 0.000126931795126721, 'epoch': 7.36} |
|
{'loss': 0.2734, 'grad_norm': 1.7079006433486938, 'learning_rate': 0.00012415490746941433, 'epoch': 7.39} |
|
{'loss': 0.1199, 'grad_norm': 1.7351499795913696, 'learning_rate': 0.0001213949936692153, 'epoch': 7.42} |
|
{'loss': 0.1314, 'grad_norm': 1.570528268814087, 'learning_rate': 0.00011865267138483999, 'epoch': 7.46} |
|
{'loss': 0.1073, 'grad_norm': 1.8326702117919922, 'learning_rate': 0.00011592855433808694, 'epoch': 7.49} |
|
{'loss': 0.1299, 'grad_norm': 1.810351848602295, 'learning_rate': 0.00011322325217648839, 'epoch': 7.52} |
|
{'loss': 0.1762, 'grad_norm': 2.8609297275543213, 'learning_rate': 0.00011053737033687346, 'epoch': 7.55} |
|
{'loss': 0.113, 'grad_norm': 1.4371622800827026, 'learning_rate': 0.00010787150990987359, 'epoch': 7.58} |
|
{'loss': 0.1056, 'grad_norm': 1.8663928508758545, 'learning_rate': 0.00010522626750540028, 'epoch': 7.62} |
|
{'loss': 0.1077, 'grad_norm': 2.7775816917419434, 'learning_rate': 0.00010260223511912654, 'epoch': 7.65} |
|
{'loss': 0.0853, 'grad_norm': 1.771044135093689, 'learning_rate': 0.00010000000000000005, 'epoch': 7.68} |
|
{'loss': 0.1255, 'grad_norm': 1.959417462348938, 'learning_rate': 9.74201445188188e-05, 'epoch': 7.71} |
|
78%|ββββββββ | 242/310 [1:46:29<29:02, 25.63s/it]
78%|ββββββββ | 243/310 [1:47:01<30:36, 27.41s/it]
78%|ββββββββ | 243/310 [1:47:01<30:36, 27.41s/it]
79%|ββββββββ | 244/310 [1:47:24<28:57, 26.33s/it]
79%|ββββββββ | 244/310 [1:47:24<28:57, 26.33s/it]
79%|ββββββββ | 245/310 [1:47:51<28:29, 26.30s/it]
79%|ββββββββ | 245/310 [1:47:51<28:29, 26.30s/it]
79%|ββββββββ | 246/310 [1:48:26<30:49, 28.90s/it]
79%|ββββββββ | 246/310 [1:48:26<30:49, 28.90s/it]
80%|ββββββββ | 247/310 [1:48:48<28:17, 26.95s/it]
80%|ββββββββ | 247/310 [1:48:48<28:17, 26.95s/it]
80%|ββββββββ | 248/310 [1:49:10<26:21, 25.50s/it]
80%|ββββββββ | 248/310 [1:49:10<26:21, 25.50s/it]
80%|ββββββββ | 249/310 [1:49:33<25:16, 24.87s/it]
80%|ββββββββ | 249/310 [1:49:33<25:16, 24.87s/it]
81%|ββββββββ | 250/310 [1:49:55<23:45, 23.76s/it]
81%|ββββββββ | 250/310 [1:49:55<23:45, 23.76s/it]
81%|ββββββββ | 251/310 [1:50:22<24:21, 24.77s/it]
81%|ββββββββ | 251/310 [1:50:22<24:21, 24.77s/it]
81%|βββββββββ | 252/310 [1:50:54<26:13, 27.13s/it]
81%|βββββββββ | 252/310 [1:50:54<26:13, 27.13s/it]
82%|βββββββββ | 253/310 [1:51:18<24:39, 25.96s/it]
82%|βββββββββ | 253/310 [1:51:18<24:39, 25.96s/it]
82%|βββββββββ | 254/310 [1:51:47<25:06, 26.89s/it]
82%|βββββββββ | 254/310 [1:51:47<25:06, 26.89s/it]
82%|βββββββββ | 255/310 [1:52:09<23:22, 25.50s/it]
82%|βββββββββ | 255/310 [1:52:09<23:22, 25.50s/it]
83%|βββββββββ | 256/310 [1:52:34<22:52, 25.41s/it]
83%|βββββββββ | 256/310 [1:52:34<22:52, 25.41s/it]
83%|βββββββββ | 257/310 [1:52:57<21:47, 24.67s/it]
83%|βββββββββ | 257/310 [1:52:57<21:47, 24.67s/it]
83%|βββββββββ | 258/310 [1:53:26<22:32, 26.02s/it]
83%|βββββββββ | 258/310 [1:53:26<22:32, 26.02s/it]
84%|βββββββββ | 259/310 [1:53:53<22:19, 26.26s/it]
84%|βββββββββ | 259/310 [1:53:53<22:19, 26.26s/it]
84%|βββββββββ | 260/310 [1:54:17<21:20, 25.61s/it]
84%|βββββββββ | 260/310 [1:54:17<21:20, 25.61s/it]
84%|βββββββββ | 261/310 [1:54:37<19:34, 23.96s/it]
84%|βββββββββ | 261/310 [1:54:37<19:34, 23.96s/it]
85%|βββββββββ | 262/310 [1:55:02<19:26, 24.30s/it]
85%|βββββββββ | 262/310 [1:55:02<19:26, 24.30s/it]
85%|βββββββββ | 263/310 [1:55:33<20:24, 26.06s/it]
85%|βββββββββ | 263/310 [1:55:33<20:24, 26.06s/it]
85%|βββββββββ | 264/310 [1:56:01<20:30, 26.76s/it]
85%|βββββββββ | 264/310 [1:56:01<20:30, 26.76s/it]
85%|βββββββββ | 265/310 [1:56:31<20:47, 27.72s/it]
85%|βββββββββ | 265/310 [1:56:31<20:47, 27.72s/it]
86%|βββββββββ | 266/310 [1:56:52<18:54, 25.78s/it]
86%|βββββββββ | 266/310 [1:56:52<18:54, 25.78s/it]
86%|βββββββββ | 267/310 [1:57:22<19:19, 26.95s/it]
86%|βββββββββ | 267/310 [1:57:22<19:19, 26.95s/it]
86%|βββββββββ | 268/310 [1:57:48<18:43, 26.74s/it]
86%|βββββββββ | 268/310 [1:57:48<18:43, 26.74s/it]
87%|βββββββββ | 269/310 [1:58:15<18:21, 26.86s/it]
87%|βββββββββ | 269/310 [1:58:15<18:21, 26.86s/it]
87%|βββββββββ | 270/310 [1:58:40<17:34, 26.37s/it]
87%|βββββββββ | 270/310 [1:58:40<17:34, 26.37s/it]
87%|βββββββββ | 271/310 [1:59:02<16:08, 24.83s/it]
87%|βββββββββ | 271/310 [1:59:02<16:08, 24.83s/it]
88%|βββββββββ | 272/310 [1:59:28<16:00, 25.27s/it]
88%|βββββββββ | 272/310 [1:59:28<16:00, 25.27s/it]
88%|βββββββββ | 273/310 [2:00:01<17:01, 27.61s/it]
88%|βββββββββ | 273/310 [2:00:01<17:01, 27.61s/it]
88%|βββββββββ | 274/310 [2:00:29<16:41, 27.82s/it]
88%|βββββββββ | 274/310 [2:00:29<16:41, 27.82s/it]
89%|βββββββββ | 275/310 [2:00:53<15:31, 26.61s/it]
89%|βββββββββ | 275/310 [2:00:53<15:31, 26.61s/it]
89%|βββββββββ | 276/310 [2:01:21<15:14, 26.90s/it]
89%|βββββββββ | 276/310 [2:01:21<15:14, 26.90s/it]
89%|βββββββββ | 277/310 [2:01:48<14:47, 26.89s/it]
89%|βββββββββ | 277/310 [2:01:48<14:47, 26.89s/it]
90%|βββββββββ | 278/310 [2:02:12<13:55, 26.10s/it]
90%|βββββββββ | 278/310 [2:02:12<13:55, 26.10s/it]
90%|βββββββββ | 279/310 [2:02:39<13:40, 26.45s/it]
90%|βββββββββ | 279/310 [2:02:39<13:40, 26.45s/it]
90%|βββββββββ | 280/310 [2:03:12<14:12, 28.43s/it]
90%|βββββββββ | 280/310 [2:03:12<14:12, 28.43s/it]
91%|βββββββββ | 281/310 [2:03:38<13:22, 27.69s/it]
91%|βββββββββ | 281/310 [2:03:38<13:22, 27.69s/it]
91%|βββββββββ | 282/310 [2:04:08<13:16, 28.43s/it]
91%|βββββββββ | 282/310 [2:04:08<13:16, 28.43s/it]
91%|ββββββββββ| 283/310 [2:04:32<12:08, 26.99s/it]
91%|ββββββββββ| 283/310 [2:04:32<12:08, 26.99s/it]
92%|ββββββββββ| 284/310 [2:04:55<11:07, 25.68s/it]
92%|ββββββββββ| 284/310 [2:04:55<11:07, 25.68s/it]
92%|ββββββββββ| 285/310 [2:05:22<10:52, 26.08s/it]
92%|ββββββββββ| 285/310 [2:05:22<10:52, 26.08s/it]
92%|ββββββββββ| 286/310 [2:05:44<10:00, 25.04s/it]
92%|ββββββββββ| 286/310 [2:05:44<10:00, 25.04s/it]
93%|ββββββββββ| 287/310 [2:06:09<09:37, 25.12s/it]
93%|ββββββββββ| 287/310 [2:06:09<09:37, 25.12s/it]
93%|ββββββββββ| 288/310 [2:06:40<09:49, 26.80s/it]
93%|ββββββββββ| 288/310 [2:06:40<09:49, 26.80s/it]
93%|ββββββββββ| 289/310 [2:07:02<08:50, 25.28s/it]
93%|ββββββββββ| 289/310 [2:07:02<08:50, 25.28s/it]
94%|ββββββββββ| 290/310 [2:07:19<07:39, 22.95s/it]
94%|ββββββββββ| 290/310 [2:07:19<07:39, 22.95s/it]
94%|ββββββββββ| 291/310 [2:07:42<07:13, 22.83s/it]
94%|ββββββββββ| 291/310 [2:07:42<07:13, 22.83s/it]
94%|ββββββββββ| 292/310 [2:08:13<07:34, 25.26s/it]
94%|ββββββββββ| 292/310 [2:08:13<07:34, 25.26s/it]
95%|ββββββββββ| 293/310 [2:08:35<06:52, 24.26s/it]
95%|ββββββββββ| 293/310 [2:08:35<06:52, 24.26s/it]
95%|ββββββββββ| 294/310 [2:08:58<06:23, 23.99s/it]
95%|ββββββββββ| 294/310 [2:08:58<06:23, 23.99s/it]
95%|ββββββββββ| 295/310 [2:09:32<06:45, 27.02s/it]
95%|ββββββββββ| 295/310 [2:09:32<06:45, 27.02s/it]
95%|ββββββββββ| 296/310 [2:10:03<06:32, 28.06s/it]
95%|ββββββββββ| 296/310 [2:10:03<06:32, 28.06s/it]
96%|ββββββββββ| 297/310 [2:10:38<06:33, 30.28s/it]
96%|ββββββββββ| 297/310 [2:10:38<06:33, 30.28s/it]
96%|ββββββββββ| 298/310 [2:10:56<05:19, 26.66s/it]
96%|ββββββββββ| 298/310 [2:10:56<05:19, 26.66s/it]
96%|ββββββββββ| 299/310 [2:11:17<04:34, 24.98s/it]
96%|ββββββββββ| 299/310 [2:11:17<04:34, 24.98s/it]
97%|ββββββββββ| 300/310 [2:11:43<04:12, 25.26s/it]
97%|ββββββββββ| 300/310 [2:11:43<04:12, 25.26s/it]/home/aiops/duanky/miniconda3/envs/hiddenlanguage/lib/python3.11/site-packages/torch/utils/checkpoint.py:460: UserWarning: torch.utils.checkpoint: please pass in use_reentrant=True or use_reentrant=False explicitly. The default value of use_reentrant will be updated to be False in the future. To maintain current behavior, pass use_reentrant=True. It is recommended that you use use_reentrant=False. Refer to docs for more details on the differences between the two variants. |
|
warnings.warn( |
|
97%|ββββββββββ| 301/310 [2:12:14<04:01, 26.78s/it]
97%|ββββββββββ| 301/310 [2:12:14<04:01, 26.78s/it]
97%|ββββββββββ| 302/310 [2:12:43<03:40, 27.60s/it]
97%|ββββββββββ| 302/310 [2:12:43<03:40, 27.60s/it]
98%|ββββββββββ| 303/310 [2:13:06<03:03, 26.18s/it]
98%|ββββββββββ| 303/310 [2:13:06<03:03, 26.18s/it]
98%|ββββββββββ| 304/310 [2:13:35<02:41, 26.95s/it]
98%|ββββββββββ| 304/310 [2:13:35<02:41, 26.95s/it]
98%|ββββββββββ| 305/310 [2:13:59<02:10, 26.04s/it]
98%|ββββββββββ| 305/310 [2:13:59<02:10, 26.04s/it]
99%|ββββββββββ| 306/310 [2:14:24<01:43, 25.79s/it]
99%|ββββββββββ| 306/310 [2:14:24<01:43, 25.79s/it]
99%|ββββββββββ| 307/310 [2:14:58<01:24, 28.14s/it]
99%|ββββββββββ| 307/310 [2:14:58<01:24, 28.14s/it]
99%|ββββββββββ| 308/310 [2:15:17<00:51, 25.67s/it]
99%|ββββββββββ| 308/310 [2:15:18<00:51, 25.67s/it]
100%|ββββββββββ| 309/310 [2:15:45<00:26, 26.19s/it]
100%|ββββββββββ| 309/310 [2:15:45<00:26, 26.19s/it]
100%|ββββββββββ| 310/310 [2:16:07<00:00, 24.84s/it]
100%|ββββββββββ| 310/310 [2:16:07<00:00, 24.84s/it]
100%|ββββββββββ| 310/310 [2:16:08<00:00, 24.84s/it]
100%|ββββββββββ| 310/310 [2:16:08<00:00, 26.35s/it] |
|
{'loss': 0.1506, 'grad_norm': 1.9411389827728271, 'learning_rate': 9.486324603789904e-05, 'epoch': 7.74} |
|
{'loss': 0.1828, 'grad_norm': 1.8620474338531494, 'learning_rate': 9.232987678186357e-05, 'epoch': 7.78} |
|
{'loss': 0.1229, 'grad_norm': 1.724796175956726, 'learning_rate': 8.982060370957952e-05, 'epoch': 7.81} |
|
{'loss': 0.113, 'grad_norm': 2.0517826080322266, 'learning_rate': 8.733598838727559e-05, 'epoch': 7.84} |
|
{'loss': 0.2102, 'grad_norm': 1.6916502714157104, 'learning_rate': 8.487658686286533e-05, 'epoch': 7.87} |
|
{'loss': 0.1077, 'grad_norm': 1.7149126529693604, 'learning_rate': 8.24429495415054e-05, 'epoch': 7.9} |
|
{'loss': 0.1172, 'grad_norm': 2.2705140113830566, 'learning_rate': 8.003562106241726e-05, 'epoch': 7.94} |
|
{'loss': 0.1168, 'grad_norm': 1.9180052280426025, 'learning_rate': 7.765514017699871e-05, 'epoch': 7.97} |
|
{'loss': 0.0966, 'grad_norm': 1.7769675254821777, 'learning_rate': 7.530203962825331e-05, 'epoch': 8.0} |
|
{'loss': 0.0699, 'grad_norm': 0.9119483232498169, 'learning_rate': 7.297684603156425e-05, 'epoch': 8.03} |
|
{'loss': 0.0902, 'grad_norm': 1.0115219354629517, 'learning_rate': 7.06800797568401e-05, 'epoch': 8.06} |
|
{'loss': 0.04, 'grad_norm': 0.8050028681755066, 'learning_rate': 6.841225481205749e-05, 'epoch': 8.1} |
|
{'loss': 0.0544, 'grad_norm': 0.963114321231842, 'learning_rate': 6.617387872822842e-05, 'epoch': 8.13} |
|
{'loss': 0.0652, 'grad_norm': 0.960823118686676, 'learning_rate': 6.396545244581608e-05, 'epoch': 8.16} |
|
{'loss': 0.0592, 'grad_norm': 1.4503118991851807, 'learning_rate': 6.178747020262707e-05, 'epoch': 8.19} |
|
{'loss': 0.0331, 'grad_norm': 1.052724838256836, 'learning_rate': 5.964041942320171e-05, 'epoch': 8.22} |
|
{'loss': 0.079, 'grad_norm': 1.4363080263137817, 'learning_rate': 5.752478060973108e-05, 'epoch': 8.26} |
|
{'loss': 0.0516, 'grad_norm': 1.039581298828125, 'learning_rate': 5.544102723452171e-05, 'epoch': 8.29} |
|
{'loss': 0.0256, 'grad_norm': 1.027633547782898, 'learning_rate': 5.338962563403478e-05, 'epoch': 8.32} |
|
{'loss': 0.0444, 'grad_norm': 1.557195782661438, 'learning_rate': 5.1371034904521134e-05, 'epoch': 8.35} |
|
{'loss': 0.0377, 'grad_norm': 1.520973801612854, 'learning_rate': 4.938570679927783e-05, 'epoch': 8.38} |
|
{'loss': 0.0579, 'grad_norm': 1.5300484895706177, 'learning_rate': 4.74340856275467e-05, 'epoch': 8.42} |
|
{'loss': 0.0785, 'grad_norm': 1.3584799766540527, 'learning_rate': 4.551660815508012e-05, 'epoch': 8.45} |
|
{'loss': 0.0606, 'grad_norm': 1.3045412302017212, 'learning_rate': 4.363370350639404e-05, 'epoch': 8.48} |
|
{'loss': 0.0365, 'grad_norm': 0.989949643611908, 'learning_rate': 4.178579306873181e-05, 'epoch': 8.51} |
|
{'loss': 0.0665, 'grad_norm': 1.7368141412734985, 'learning_rate': 3.997329039775877e-05, 'epoch': 8.54} |
|
{'loss': 0.0564, 'grad_norm': 1.9525558948516846, 'learning_rate': 3.819660112501053e-05, 'epoch': 8.58} |
|
{'loss': 0.0707, 'grad_norm': 1.1727070808410645, 'learning_rate': 3.645612286711373e-05, 'epoch': 8.61} |
|
{'loss': 0.0594, 'grad_norm': 1.271376132965088, 'learning_rate': 3.4752245136801065e-05, 'epoch': 8.64} |
|
{'loss': 0.0385, 'grad_norm': 1.276285171508789, 'learning_rate': 3.3085349255739474e-05, 'epoch': 8.67} |
|
{'loss': 0.0509, 'grad_norm': 1.3125935792922974, 'learning_rate': 3.1455808269192166e-05, 'epoch': 8.7} |
|
{'loss': 0.0698, 'grad_norm': 1.5089739561080933, 'learning_rate': 2.986398686253211e-05, 'epoch': 8.74} |
|
{'loss': 0.0439, 'grad_norm': 1.0176442861557007, 'learning_rate': 2.831024127962678e-05, 'epoch': 8.77} |
|
{'loss': 0.0304, 'grad_norm': 1.0032261610031128, 'learning_rate': 2.679491924311226e-05, 'epoch': 8.8} |
|
{'loss': 0.0614, 'grad_norm': 1.1879023313522339, 'learning_rate': 2.531835987657407e-05, 'epoch': 8.83} |
|
{'loss': 0.0512, 'grad_norm': 1.0695905685424805, 'learning_rate': 2.38808936286524e-05, 'epoch': 8.86} |
|
{'loss': 0.0379, 'grad_norm': 0.9420713782310486, 'learning_rate': 2.248284219908918e-05, 'epoch': 8.9} |
|
{'loss': 0.0509, 'grad_norm': 1.0384514331817627, 'learning_rate': 2.1124518466732667e-05, 'epoch': 8.93} |
|
{'loss': 0.076, 'grad_norm': 1.2165974378585815, 'learning_rate': 1.9806226419516192e-05, 'epoch': 8.96} |
|
{'loss': 0.0537, 'grad_norm': 1.0320138931274414, 'learning_rate': 1.8528261086427024e-05, 'epoch': 8.99} |
|
{'loss': 0.0416, 'grad_norm': 0.9421552419662476, 'learning_rate': 1.7290908471479805e-05, 'epoch': 9.02} |
|
{'loss': 0.0333, 'grad_norm': 0.9624818563461304, 'learning_rate': 1.6094445489709885e-05, 'epoch': 9.06} |
|
{'loss': 0.0139, 'grad_norm': 0.654150128364563, 'learning_rate': 1.493913990520066e-05, 'epoch': 9.09} |
|
{'loss': 0.0619, 'grad_norm': 0.719512939453125, 'learning_rate': 1.3825250271159173e-05, 'epoch': 9.12} |
|
{'loss': 0.0102, 'grad_norm': 0.4500947892665863, 'learning_rate': 1.275302587205256e-05, 'epoch': 9.15} |
|
{'loss': 0.0153, 'grad_norm': 0.7063645720481873, 'learning_rate': 1.1722706667819383e-05, 'epoch': 9.18} |
|
{'loss': 0.0351, 'grad_norm': 0.7226387858390808, 'learning_rate': 1.073452324016715e-05, 'epoch': 9.22} |
|
{'loss': 0.0124, 'grad_norm': 0.7622669339179993, 'learning_rate': 9.788696740969295e-06, 'epoch': 9.25} |
|
{'loss': 0.009, 'grad_norm': 0.6491491794586182, 'learning_rate': 8.885438842771843e-06, 'epoch': 9.28} |
|
{'loss': 0.0117, 'grad_norm': 0.5820807218551636, 'learning_rate': 8.024951691422212e-06, 'epoch': 9.31} |
|
{'loss': 0.0349, 'grad_norm': 0.9873208999633789, 'learning_rate': 7.2074278608293525e-06, 'epoch': 9.34} |
|
{'loss': 0.0107, 'grad_norm': 0.5595976114273071, 'learning_rate': 6.4330503098667175e-06, 'epoch': 9.38} |
|
{'loss': 0.0193, 'grad_norm': 0.6088156700134277, 'learning_rate': 5.701992341426499e-06, 'epoch': 9.41} |
|
{'loss': 0.0409, 'grad_norm': 0.7730354070663452, 'learning_rate': 5.0144175636352765e-06, 'epoch': 9.44} |
|
{'loss': 0.0358, 'grad_norm': 0.6233725547790527, 'learning_rate': 4.370479853238863e-06, 'epoch': 9.47} |
|
{'loss': 0.0633, 'grad_norm': 0.7955800890922546, 'learning_rate': 3.770323321165492e-06, 'epoch': 9.5} |
|
{'loss': 0.0093, 'grad_norm': 0.4983786344528198, 'learning_rate': 3.2140822802740668e-06, 'epoch': 9.54} |
|
{'loss': 0.0258, 'grad_norm': 0.6332207322120667, 'learning_rate': 2.701881215295732e-06, 'epoch': 9.57} |
|
{'loss': 0.0278, 'grad_norm': 0.7181130647659302, 'learning_rate': 2.2338347549742956e-06, 'epoch': 9.6} |
|
{'loss': 0.0154, 'grad_norm': 0.6782166957855225, 'learning_rate': 1.81004764641306e-06, 'epoch': 9.63} |
|
{'loss': 0.0361, 'grad_norm': 0.6509042978286743, 'learning_rate': 1.4306147316325291e-06, 'epoch': 9.66} |
|
{'loss': 0.0145, 'grad_norm': 0.7593727707862854, 'learning_rate': 1.0956209263453421e-06, 'epoch': 9.7} |
|
{'loss': 0.0321, 'grad_norm': 0.6939243674278259, 'learning_rate': 8.051412009521864e-07, 'epoch': 9.73} |
|
{'loss': 0.0219, 'grad_norm': 0.6120842099189758, 'learning_rate': 5.592405637639741e-07, 'epoch': 9.76} |
|
{'loss': 0.0186, 'grad_norm': 0.6328569054603577, 'learning_rate': 3.5797404645296906e-07, 'epoch': 9.79} |
|
{'loss': 0.0573, 'grad_norm': 0.7918028235435486, 'learning_rate': 2.0138669173708213e-07, 'epoch': 9.82} |
|
{'loss': 0.0093, 'grad_norm': 0.41346606612205505, 'learning_rate': 8.951354329933547e-08, 'epoch': 9.86} |
|
{'loss': 0.037, 'grad_norm': 0.6492690443992615, 'learning_rate': 2.2379637945313392e-08, 'epoch': 9.89} |
|
{'loss': 0.0152, 'grad_norm': 0.7126352190971375, 'learning_rate': 0.0, 'epoch': 9.92} |
|
{'train_runtime': 8177.5611, 'train_samples_per_second': 1.223, 'train_steps_per_second': 0.038, 'train_loss': 0.7486070978004605, 'epoch': 9.92} |
|
wandb: - 0.012 MB of 0.012 MB uploaded
wandb: \ 0.012 MB of 0.012 MB uploaded
wandb: | 0.012 MB of 0.055 MB uploaded
wandb: / 0.055 MB of 0.055 MB uploaded
wandb: - 0.055 MB of 0.055 MB uploaded
wandb: \ 0.055 MB of 0.055 MB uploaded
wandb: |
|
wandb: Run history: |
|
wandb: train/epoch βββββββββββββββββββββ
β
β
β
β
β
ββββββββββββββ |
|
wandb: train/global_step βββββββββββββββββββββ
β
β
β
β
β
ββββββββββββββ |
|
wandb: train/grad_norm ββββββββββ
ββββ
ββββββ
βββ
βββββββββββββββββ |
|
wandb: train/learning_rate ββββββββ
β
ββββββββββββββββ
β
β
βββββββββββββ |
|
wandb: train/loss βββββββββ
ββ
β
ββββββββββββββββββββββββββββ |
|
wandb: |
|
wandb: Run summary: |
|
wandb: total_flos 4.638730387893289e+17 |
|
wandb: train/epoch 9.92 |
|
wandb: train/global_step 310 |
|
wandb: train/grad_norm 0.71264 |
|
wandb: train/learning_rate 0.0 |
|
wandb: train/loss 0.0152 |
|
wandb: train_loss 0.74861 |
|
wandb: train_runtime 8177.5611 |
|
wandb: train_samples_per_second 1.223 |
|
wandb: train_steps_per_second 0.038 |
|
wandb: |
|
wandb: π View run gemma-2_unnatural_instruction_lima at: https://wandb.ai/nus_keyu/llm-attack/runs/p7b28tvc |
|
wandb: οΈβ‘ View job at https://wandb.ai/nus_keyu/llm-attack/jobs/QXJ0aWZhY3RDb2xsZWN0aW9uOjE0ODYxODc2Mg==/version_details/v33 |
|
wandb: Synced 6 W&B file(s), 0 media file(s), 0 artifact file(s) and 0 other file(s) |
|
wandb: Find logs at: ./wandb/run-20241024_084830-p7b28tvc/logs |
|
|