| 2024-07-30 02:17:46,750 ---------------------------------------------------------------------------------------------------- | |
| 2024-07-30 02:17:46,750 Training Model | |
| 2024-07-30 02:17:46,750 ---------------------------------------------------------------------------------------------------- | |
| 2024-07-30 02:17:46,750 Translator( | |
| (encoder): EncoderLSTM( | |
| (embedding): Embedding(111, 300, padding_idx=0) | |
| (dropout): Dropout(p=0.1, inplace=False) | |
| (lstm): LSTM(300, 512, batch_first=True) | |
| ) | |
| (decoder): DecoderLSTM( | |
| (embedding): Embedding(105, 300, padding_idx=0) | |
| (dropout): Dropout(p=0.1, inplace=False) | |
| (lstm): LSTM(300, 512, batch_first=True) | |
| (attention): DotProductAttention( | |
| (softmax): Softmax(dim=-1) | |
| (combined2hidden): Sequential( | |
| (0): Linear(in_features=1024, out_features=512, bias=True) | |
| (1): ReLU() | |
| ) | |
| ) | |
| (hidden2vocab): Linear(in_features=512, out_features=105, bias=True) | |
| (log_softmax): LogSoftmax(dim=-1) | |
| ) | |
| ) | |
| 2024-07-30 02:17:46,750 ---------------------------------------------------------------------------------------------------- | |
| 2024-07-30 02:17:46,750 Training Hyperparameters: | |
| 2024-07-30 02:17:46,750 - max_epochs: 10 | |
| 2024-07-30 02:17:46,750 - learning_rate: 0.001 | |
| 2024-07-30 02:17:46,750 - batch_size: 128 | |
| 2024-07-30 02:17:46,750 - patience: 5 | |
| 2024-07-30 02:17:46,750 - scheduler_patience: 3 | |
| 2024-07-30 02:17:46,750 - teacher_forcing_ratio: 0.5 | |
| 2024-07-30 02:17:46,750 ---------------------------------------------------------------------------------------------------- | |
| 2024-07-30 02:17:46,750 Computational Parameters: | |
| 2024-07-30 02:17:46,750 - num_workers: 4 | |
| 2024-07-30 02:17:46,750 - device: device(type='cuda', index=0) | |
| 2024-07-30 02:17:46,750 ---------------------------------------------------------------------------------------------------- | |
| 2024-07-30 02:17:46,750 Dataset Splits: | |
| 2024-07-30 02:17:46,751 - train: 129388 data points | |
| 2024-07-30 02:17:46,751 - dev: 18485 data points | |
| 2024-07-30 02:17:46,751 - test: 36969 data points | |
| 2024-07-30 02:17:46,751 ---------------------------------------------------------------------------------------------------- | |
| 2024-07-30 02:17:46,751 EPOCH 1 | |
| 2024-07-30 02:20:14,584 batch 101/1011 - loss 2.86735094 - lr 0.0010 - time 147.83s | |
| 2024-07-30 02:22:39,297 batch 202/1011 - loss 2.72946281 - lr 0.0010 - time 292.55s | |
| 2024-07-30 02:24:56,481 batch 303/1011 - loss 2.65172425 - lr 0.0010 - time 429.73s | |
| 2024-07-30 02:27:17,762 batch 404/1011 - loss 2.60293996 - lr 0.0010 - time 571.01s | |
| 2024-07-30 02:29:41,974 batch 505/1011 - loss 2.56301742 - lr 0.0010 - time 715.22s | |
| 2024-07-30 02:32:09,632 batch 606/1011 - loss 2.52287651 - lr 0.0010 - time 862.88s | |
| 2024-07-30 02:34:33,931 batch 707/1011 - loss 2.47866768 - lr 0.0010 - time 1007.18s | |
| 2024-07-30 02:37:03,416 batch 808/1011 - loss 2.44011894 - lr 0.0010 - time 1156.66s | |
| 2024-07-30 02:39:22,440 batch 909/1011 - loss 2.40451258 - lr 0.0010 - time 1295.69s | |
| 2024-07-30 02:41:41,408 batch 1010/1011 - loss 2.37034330 - lr 0.0010 - time 1434.66s | |
| 2024-07-30 02:41:42,788 ---------------------------------------------------------------------------------------------------- | |
| 2024-07-30 02:41:42,790 EPOCH 1 DONE | |
| 2024-07-30 02:42:20,287 TRAIN Loss: 2.3699 | |
| 2024-07-30 02:42:20,288 DEV Loss: 3.7084 | |
| 2024-07-30 02:42:20,288 DEV Perplexity: 40.7891 | |
| 2024-07-30 02:42:20,288 New best score! | |
| 2024-07-30 02:42:20,290 ---------------------------------------------------------------------------------------------------- | |
| 2024-07-30 02:42:20,290 EPOCH 2 | |
| 2024-07-30 02:44:47,233 batch 101/1011 - loss 2.00985493 - lr 0.0010 - time 146.94s | |
| 2024-07-30 02:47:17,648 batch 202/1011 - loss 1.99881361 - lr 0.0010 - time 297.36s | |
| 2024-07-30 02:49:36,884 batch 303/1011 - loss 1.98625110 - lr 0.0010 - time 436.59s | |
| 2024-07-30 02:52:05,471 batch 404/1011 - loss 1.97792626 - lr 0.0010 - time 585.18s | |
| 2024-07-30 02:54:23,284 batch 505/1011 - loss 1.96699081 - lr 0.0010 - time 722.99s | |
| 2024-07-30 02:56:42,486 batch 606/1011 - loss 1.95183234 - lr 0.0010 - time 862.20s | |
| 2024-07-30 02:59:04,204 batch 707/1011 - loss 1.94068404 - lr 0.0010 - time 1003.91s | |
| 2024-07-30 03:01:31,559 batch 808/1011 - loss 1.93031463 - lr 0.0010 - time 1151.27s | |
| 2024-07-30 03:03:52,584 batch 909/1011 - loss 1.91933983 - lr 0.0010 - time 1292.29s | |
| 2024-07-30 03:06:14,554 batch 1010/1011 - loss 1.90792970 - lr 0.0010 - time 1434.26s | |
| 2024-07-30 03:06:15,868 ---------------------------------------------------------------------------------------------------- | |
| 2024-07-30 03:06:15,870 EPOCH 2 DONE | |
| 2024-07-30 03:06:53,487 TRAIN Loss: 1.9079 | |
| 2024-07-30 03:06:53,488 DEV Loss: 4.0378 | |
| 2024-07-30 03:06:53,488 DEV Perplexity: 56.7028 | |
| 2024-07-30 03:06:53,488 No improvement for 1 epoch(s) | |
| 2024-07-30 03:06:53,488 ---------------------------------------------------------------------------------------------------- | |
| 2024-07-30 03:06:53,489 EPOCH 3 | |
| 2024-07-30 03:09:15,463 batch 101/1011 - loss 1.78907573 - lr 0.0010 - time 141.97s | |
| 2024-07-30 03:11:42,195 batch 202/1011 - loss 1.78422776 - lr 0.0010 - time 288.71s | |
| 2024-07-30 03:13:59,221 batch 303/1011 - loss 1.77906499 - lr 0.0010 - time 425.73s | |
| 2024-07-30 03:16:16,933 batch 404/1011 - loss 1.77259262 - lr 0.0010 - time 563.44s | |
| 2024-07-30 03:18:33,183 batch 505/1011 - loss 1.76395207 - lr 0.0010 - time 699.69s | |
| 2024-07-30 03:20:56,446 batch 606/1011 - loss 1.75870391 - lr 0.0010 - time 842.96s | |
| 2024-07-30 03:23:22,609 batch 707/1011 - loss 1.75321817 - lr 0.0010 - time 989.12s | |
| 2024-07-30 03:25:54,166 batch 808/1011 - loss 1.74617685 - lr 0.0010 - time 1140.68s | |
| 2024-07-30 03:28:20,633 batch 909/1011 - loss 1.74084473 - lr 0.0010 - time 1287.14s | |
| 2024-07-30 03:30:42,444 batch 1010/1011 - loss 1.73547362 - lr 0.0010 - time 1428.96s | |
| 2024-07-30 03:30:43,412 ---------------------------------------------------------------------------------------------------- | |
| 2024-07-30 03:30:43,414 EPOCH 3 DONE | |
| 2024-07-30 03:31:20,957 TRAIN Loss: 1.7354 | |
| 2024-07-30 03:31:20,958 DEV Loss: 4.1249 | |
| 2024-07-30 03:31:20,958 DEV Perplexity: 61.8625 | |
| 2024-07-30 03:31:20,958 No improvement for 2 epoch(s) | |
| 2024-07-30 03:31:20,958 ---------------------------------------------------------------------------------------------------- | |
| 2024-07-30 03:31:20,958 EPOCH 4 | |
| 2024-07-30 03:33:36,147 batch 101/1011 - loss 1.66521794 - lr 0.0010 - time 135.19s | |
| 2024-07-30 03:35:55,583 batch 202/1011 - loss 1.66554682 - lr 0.0010 - time 274.62s | |
| 2024-07-30 03:38:27,233 batch 303/1011 - loss 1.65796713 - lr 0.0010 - time 426.28s | |
| 2024-07-30 03:40:44,185 batch 404/1011 - loss 1.65309123 - lr 0.0010 - time 563.23s | |
| 2024-07-30 03:43:11,092 batch 505/1011 - loss 1.64910596 - lr 0.0010 - time 710.13s | |
| 2024-07-30 03:45:38,169 batch 606/1011 - loss 1.64491277 - lr 0.0010 - time 857.21s | |
| 2024-07-30 03:48:03,029 batch 707/1011 - loss 1.64139012 - lr 0.0010 - time 1002.07s | |
| 2024-07-30 03:50:23,760 batch 808/1011 - loss 1.63702920 - lr 0.0010 - time 1142.80s | |
| 2024-07-30 03:52:50,807 batch 909/1011 - loss 1.63416369 - lr 0.0010 - time 1289.85s | |
| 2024-07-30 03:55:07,385 batch 1010/1011 - loss 1.63085939 - lr 0.0010 - time 1426.43s | |
| 2024-07-30 03:55:08,515 ---------------------------------------------------------------------------------------------------- | |
| 2024-07-30 03:55:08,516 EPOCH 4 DONE | |
| 2024-07-30 03:55:46,651 TRAIN Loss: 1.6309 | |
| 2024-07-30 03:55:46,652 DEV Loss: 4.2698 | |
| 2024-07-30 03:55:46,652 DEV Perplexity: 71.5087 | |
| 2024-07-30 03:55:46,652 No improvement for 3 epoch(s) | |
| 2024-07-30 03:55:46,652 ---------------------------------------------------------------------------------------------------- | |
| 2024-07-30 03:55:46,652 EPOCH 5 | |
| 2024-07-30 03:58:14,943 batch 101/1011 - loss 1.57273971 - lr 0.0010 - time 148.29s | |
| 2024-07-30 04:00:48,395 batch 202/1011 - loss 1.57111556 - lr 0.0010 - time 301.74s | |
| 2024-07-30 04:03:11,425 batch 303/1011 - loss 1.57657209 - lr 0.0010 - time 444.77s | |
| 2024-07-30 04:05:35,078 batch 404/1011 - loss 1.57244594 - lr 0.0010 - time 588.43s | |
| 2024-07-30 04:07:57,959 batch 505/1011 - loss 1.57071598 - lr 0.0010 - time 731.31s | |
| 2024-07-30 04:10:15,011 batch 606/1011 - loss 1.56758577 - lr 0.0010 - time 868.36s | |
| 2024-07-30 04:12:35,064 batch 707/1011 - loss 1.56390217 - lr 0.0010 - time 1008.41s | |
| 2024-07-30 04:14:53,925 batch 808/1011 - loss 1.56026725 - lr 0.0010 - time 1147.27s | |
| 2024-07-30 04:17:11,220 batch 909/1011 - loss 1.55733682 - lr 0.0010 - time 1284.57s | |
| 2024-07-30 04:19:37,188 batch 1010/1011 - loss 1.55493684 - lr 0.0010 - time 1430.54s | |
| 2024-07-30 04:19:38,665 ---------------------------------------------------------------------------------------------------- | |
| 2024-07-30 04:19:38,667 EPOCH 5 DONE | |
| 2024-07-30 04:20:16,279 TRAIN Loss: 1.5550 | |
| 2024-07-30 04:20:16,279 DEV Loss: 4.2535 | |
| 2024-07-30 04:20:16,279 DEV Perplexity: 70.3542 | |
| 2024-07-30 04:20:16,279 No improvement for 4 epoch(s) | |
| 2024-07-30 04:20:16,279 ---------------------------------------------------------------------------------------------------- | |
| 2024-07-30 04:20:16,279 EPOCH 6 | |
| 2024-07-30 04:22:41,790 batch 101/1011 - loss 1.48926209 - lr 0.0001 - time 145.51s | |
| 2024-07-30 04:25:08,198 batch 202/1011 - loss 1.49229986 - lr 0.0001 - time 291.92s | |
| 2024-07-30 04:27:29,248 batch 303/1011 - loss 1.49066265 - lr 0.0001 - time 432.97s | |
| 2024-07-30 04:29:59,135 batch 404/1011 - loss 1.48735474 - lr 0.0001 - time 582.86s | |
| 2024-07-30 04:32:13,744 batch 505/1011 - loss 1.48638164 - lr 0.0001 - time 717.47s | |
| 2024-07-30 04:34:44,208 batch 606/1011 - loss 1.48563741 - lr 0.0001 - time 867.93s | |
| 2024-07-30 04:37:02,924 batch 707/1011 - loss 1.48429131 - lr 0.0001 - time 1006.64s | |
| 2024-07-30 04:39:29,286 batch 808/1011 - loss 1.48379995 - lr 0.0001 - time 1153.01s | |
| 2024-07-30 04:41:45,546 batch 909/1011 - loss 1.48132304 - lr 0.0001 - time 1289.27s | |
| 2024-07-30 04:44:12,255 batch 1010/1011 - loss 1.48057979 - lr 0.0001 - time 1435.98s | |
| 2024-07-30 04:44:13,677 ---------------------------------------------------------------------------------------------------- | |
| 2024-07-30 04:44:13,680 EPOCH 6 DONE | |
| 2024-07-30 04:44:51,213 TRAIN Loss: 1.4806 | |
| 2024-07-30 04:44:51,214 DEV Loss: 4.3550 | |
| 2024-07-30 04:44:51,214 DEV Perplexity: 77.8631 | |
| 2024-07-30 04:44:51,214 No improvement for 5 epoch(s) | |
| 2024-07-30 04:44:51,214 Patience reached: Terminating model training due to early stopping | |
| 2024-07-30 04:44:51,214 ---------------------------------------------------------------------------------------------------- | |
| 2024-07-30 04:44:51,214 Finished Training | |
| 2024-07-30 04:46:11,048 TEST Perplexity: 40.8583 | |
| 2024-07-30 04:56:51,808 TEST BLEU = 5.88 51.6/16.4/1.7/0.8 (BP = 1.000 ratio = 1.000 hyp_len = 62 ref_len = 62) | |