dobbersc's picture
Add swedish models
274a081 verified
2024-07-30 02:17:46,750 ----------------------------------------------------------------------------------------------------
2024-07-30 02:17:46,750 Training Model
2024-07-30 02:17:46,750 ----------------------------------------------------------------------------------------------------
2024-07-30 02:17:46,750 Translator(
(encoder): EncoderLSTM(
(embedding): Embedding(111, 300, padding_idx=0)
(dropout): Dropout(p=0.1, inplace=False)
(lstm): LSTM(300, 512, batch_first=True)
)
(decoder): DecoderLSTM(
(embedding): Embedding(105, 300, padding_idx=0)
(dropout): Dropout(p=0.1, inplace=False)
(lstm): LSTM(300, 512, batch_first=True)
(attention): DotProductAttention(
(softmax): Softmax(dim=-1)
(combined2hidden): Sequential(
(0): Linear(in_features=1024, out_features=512, bias=True)
(1): ReLU()
)
)
(hidden2vocab): Linear(in_features=512, out_features=105, bias=True)
(log_softmax): LogSoftmax(dim=-1)
)
)
2024-07-30 02:17:46,750 ----------------------------------------------------------------------------------------------------
2024-07-30 02:17:46,750 Training Hyperparameters:
2024-07-30 02:17:46,750 - max_epochs: 10
2024-07-30 02:17:46,750 - learning_rate: 0.001
2024-07-30 02:17:46,750 - batch_size: 128
2024-07-30 02:17:46,750 - patience: 5
2024-07-30 02:17:46,750 - scheduler_patience: 3
2024-07-30 02:17:46,750 - teacher_forcing_ratio: 0.5
2024-07-30 02:17:46,750 ----------------------------------------------------------------------------------------------------
2024-07-30 02:17:46,750 Computational Parameters:
2024-07-30 02:17:46,750 - num_workers: 4
2024-07-30 02:17:46,750 - device: device(type='cuda', index=0)
2024-07-30 02:17:46,750 ----------------------------------------------------------------------------------------------------
2024-07-30 02:17:46,750 Dataset Splits:
2024-07-30 02:17:46,751 - train: 129388 data points
2024-07-30 02:17:46,751 - dev: 18485 data points
2024-07-30 02:17:46,751 - test: 36969 data points
2024-07-30 02:17:46,751 ----------------------------------------------------------------------------------------------------
2024-07-30 02:17:46,751 EPOCH 1
2024-07-30 02:20:14,584 batch 101/1011 - loss 2.86735094 - lr 0.0010 - time 147.83s
2024-07-30 02:22:39,297 batch 202/1011 - loss 2.72946281 - lr 0.0010 - time 292.55s
2024-07-30 02:24:56,481 batch 303/1011 - loss 2.65172425 - lr 0.0010 - time 429.73s
2024-07-30 02:27:17,762 batch 404/1011 - loss 2.60293996 - lr 0.0010 - time 571.01s
2024-07-30 02:29:41,974 batch 505/1011 - loss 2.56301742 - lr 0.0010 - time 715.22s
2024-07-30 02:32:09,632 batch 606/1011 - loss 2.52287651 - lr 0.0010 - time 862.88s
2024-07-30 02:34:33,931 batch 707/1011 - loss 2.47866768 - lr 0.0010 - time 1007.18s
2024-07-30 02:37:03,416 batch 808/1011 - loss 2.44011894 - lr 0.0010 - time 1156.66s
2024-07-30 02:39:22,440 batch 909/1011 - loss 2.40451258 - lr 0.0010 - time 1295.69s
2024-07-30 02:41:41,408 batch 1010/1011 - loss 2.37034330 - lr 0.0010 - time 1434.66s
2024-07-30 02:41:42,788 ----------------------------------------------------------------------------------------------------
2024-07-30 02:41:42,790 EPOCH 1 DONE
2024-07-30 02:42:20,287 TRAIN Loss: 2.3699
2024-07-30 02:42:20,288 DEV Loss: 3.7084
2024-07-30 02:42:20,288 DEV Perplexity: 40.7891
2024-07-30 02:42:20,288 New best score!
2024-07-30 02:42:20,290 ----------------------------------------------------------------------------------------------------
2024-07-30 02:42:20,290 EPOCH 2
2024-07-30 02:44:47,233 batch 101/1011 - loss 2.00985493 - lr 0.0010 - time 146.94s
2024-07-30 02:47:17,648 batch 202/1011 - loss 1.99881361 - lr 0.0010 - time 297.36s
2024-07-30 02:49:36,884 batch 303/1011 - loss 1.98625110 - lr 0.0010 - time 436.59s
2024-07-30 02:52:05,471 batch 404/1011 - loss 1.97792626 - lr 0.0010 - time 585.18s
2024-07-30 02:54:23,284 batch 505/1011 - loss 1.96699081 - lr 0.0010 - time 722.99s
2024-07-30 02:56:42,486 batch 606/1011 - loss 1.95183234 - lr 0.0010 - time 862.20s
2024-07-30 02:59:04,204 batch 707/1011 - loss 1.94068404 - lr 0.0010 - time 1003.91s
2024-07-30 03:01:31,559 batch 808/1011 - loss 1.93031463 - lr 0.0010 - time 1151.27s
2024-07-30 03:03:52,584 batch 909/1011 - loss 1.91933983 - lr 0.0010 - time 1292.29s
2024-07-30 03:06:14,554 batch 1010/1011 - loss 1.90792970 - lr 0.0010 - time 1434.26s
2024-07-30 03:06:15,868 ----------------------------------------------------------------------------------------------------
2024-07-30 03:06:15,870 EPOCH 2 DONE
2024-07-30 03:06:53,487 TRAIN Loss: 1.9079
2024-07-30 03:06:53,488 DEV Loss: 4.0378
2024-07-30 03:06:53,488 DEV Perplexity: 56.7028
2024-07-30 03:06:53,488 No improvement for 1 epoch(s)
2024-07-30 03:06:53,488 ----------------------------------------------------------------------------------------------------
2024-07-30 03:06:53,489 EPOCH 3
2024-07-30 03:09:15,463 batch 101/1011 - loss 1.78907573 - lr 0.0010 - time 141.97s
2024-07-30 03:11:42,195 batch 202/1011 - loss 1.78422776 - lr 0.0010 - time 288.71s
2024-07-30 03:13:59,221 batch 303/1011 - loss 1.77906499 - lr 0.0010 - time 425.73s
2024-07-30 03:16:16,933 batch 404/1011 - loss 1.77259262 - lr 0.0010 - time 563.44s
2024-07-30 03:18:33,183 batch 505/1011 - loss 1.76395207 - lr 0.0010 - time 699.69s
2024-07-30 03:20:56,446 batch 606/1011 - loss 1.75870391 - lr 0.0010 - time 842.96s
2024-07-30 03:23:22,609 batch 707/1011 - loss 1.75321817 - lr 0.0010 - time 989.12s
2024-07-30 03:25:54,166 batch 808/1011 - loss 1.74617685 - lr 0.0010 - time 1140.68s
2024-07-30 03:28:20,633 batch 909/1011 - loss 1.74084473 - lr 0.0010 - time 1287.14s
2024-07-30 03:30:42,444 batch 1010/1011 - loss 1.73547362 - lr 0.0010 - time 1428.96s
2024-07-30 03:30:43,412 ----------------------------------------------------------------------------------------------------
2024-07-30 03:30:43,414 EPOCH 3 DONE
2024-07-30 03:31:20,957 TRAIN Loss: 1.7354
2024-07-30 03:31:20,958 DEV Loss: 4.1249
2024-07-30 03:31:20,958 DEV Perplexity: 61.8625
2024-07-30 03:31:20,958 No improvement for 2 epoch(s)
2024-07-30 03:31:20,958 ----------------------------------------------------------------------------------------------------
2024-07-30 03:31:20,958 EPOCH 4
2024-07-30 03:33:36,147 batch 101/1011 - loss 1.66521794 - lr 0.0010 - time 135.19s
2024-07-30 03:35:55,583 batch 202/1011 - loss 1.66554682 - lr 0.0010 - time 274.62s
2024-07-30 03:38:27,233 batch 303/1011 - loss 1.65796713 - lr 0.0010 - time 426.28s
2024-07-30 03:40:44,185 batch 404/1011 - loss 1.65309123 - lr 0.0010 - time 563.23s
2024-07-30 03:43:11,092 batch 505/1011 - loss 1.64910596 - lr 0.0010 - time 710.13s
2024-07-30 03:45:38,169 batch 606/1011 - loss 1.64491277 - lr 0.0010 - time 857.21s
2024-07-30 03:48:03,029 batch 707/1011 - loss 1.64139012 - lr 0.0010 - time 1002.07s
2024-07-30 03:50:23,760 batch 808/1011 - loss 1.63702920 - lr 0.0010 - time 1142.80s
2024-07-30 03:52:50,807 batch 909/1011 - loss 1.63416369 - lr 0.0010 - time 1289.85s
2024-07-30 03:55:07,385 batch 1010/1011 - loss 1.63085939 - lr 0.0010 - time 1426.43s
2024-07-30 03:55:08,515 ----------------------------------------------------------------------------------------------------
2024-07-30 03:55:08,516 EPOCH 4 DONE
2024-07-30 03:55:46,651 TRAIN Loss: 1.6309
2024-07-30 03:55:46,652 DEV Loss: 4.2698
2024-07-30 03:55:46,652 DEV Perplexity: 71.5087
2024-07-30 03:55:46,652 No improvement for 3 epoch(s)
2024-07-30 03:55:46,652 ----------------------------------------------------------------------------------------------------
2024-07-30 03:55:46,652 EPOCH 5
2024-07-30 03:58:14,943 batch 101/1011 - loss 1.57273971 - lr 0.0010 - time 148.29s
2024-07-30 04:00:48,395 batch 202/1011 - loss 1.57111556 - lr 0.0010 - time 301.74s
2024-07-30 04:03:11,425 batch 303/1011 - loss 1.57657209 - lr 0.0010 - time 444.77s
2024-07-30 04:05:35,078 batch 404/1011 - loss 1.57244594 - lr 0.0010 - time 588.43s
2024-07-30 04:07:57,959 batch 505/1011 - loss 1.57071598 - lr 0.0010 - time 731.31s
2024-07-30 04:10:15,011 batch 606/1011 - loss 1.56758577 - lr 0.0010 - time 868.36s
2024-07-30 04:12:35,064 batch 707/1011 - loss 1.56390217 - lr 0.0010 - time 1008.41s
2024-07-30 04:14:53,925 batch 808/1011 - loss 1.56026725 - lr 0.0010 - time 1147.27s
2024-07-30 04:17:11,220 batch 909/1011 - loss 1.55733682 - lr 0.0010 - time 1284.57s
2024-07-30 04:19:37,188 batch 1010/1011 - loss 1.55493684 - lr 0.0010 - time 1430.54s
2024-07-30 04:19:38,665 ----------------------------------------------------------------------------------------------------
2024-07-30 04:19:38,667 EPOCH 5 DONE
2024-07-30 04:20:16,279 TRAIN Loss: 1.5550
2024-07-30 04:20:16,279 DEV Loss: 4.2535
2024-07-30 04:20:16,279 DEV Perplexity: 70.3542
2024-07-30 04:20:16,279 No improvement for 4 epoch(s)
2024-07-30 04:20:16,279 ----------------------------------------------------------------------------------------------------
2024-07-30 04:20:16,279 EPOCH 6
2024-07-30 04:22:41,790 batch 101/1011 - loss 1.48926209 - lr 0.0001 - time 145.51s
2024-07-30 04:25:08,198 batch 202/1011 - loss 1.49229986 - lr 0.0001 - time 291.92s
2024-07-30 04:27:29,248 batch 303/1011 - loss 1.49066265 - lr 0.0001 - time 432.97s
2024-07-30 04:29:59,135 batch 404/1011 - loss 1.48735474 - lr 0.0001 - time 582.86s
2024-07-30 04:32:13,744 batch 505/1011 - loss 1.48638164 - lr 0.0001 - time 717.47s
2024-07-30 04:34:44,208 batch 606/1011 - loss 1.48563741 - lr 0.0001 - time 867.93s
2024-07-30 04:37:02,924 batch 707/1011 - loss 1.48429131 - lr 0.0001 - time 1006.64s
2024-07-30 04:39:29,286 batch 808/1011 - loss 1.48379995 - lr 0.0001 - time 1153.01s
2024-07-30 04:41:45,546 batch 909/1011 - loss 1.48132304 - lr 0.0001 - time 1289.27s
2024-07-30 04:44:12,255 batch 1010/1011 - loss 1.48057979 - lr 0.0001 - time 1435.98s
2024-07-30 04:44:13,677 ----------------------------------------------------------------------------------------------------
2024-07-30 04:44:13,680 EPOCH 6 DONE
2024-07-30 04:44:51,213 TRAIN Loss: 1.4806
2024-07-30 04:44:51,214 DEV Loss: 4.3550
2024-07-30 04:44:51,214 DEV Perplexity: 77.8631
2024-07-30 04:44:51,214 No improvement for 5 epoch(s)
2024-07-30 04:44:51,214 Patience reached: Terminating model training due to early stopping
2024-07-30 04:44:51,214 ----------------------------------------------------------------------------------------------------
2024-07-30 04:44:51,214 Finished Training
2024-07-30 04:46:11,048 TEST Perplexity: 40.8583
2024-07-30 04:56:51,808 TEST BLEU = 5.88 51.6/16.4/1.7/0.8 (BP = 1.000 ratio = 1.000 hyp_len = 62 ref_len = 62)