stefan-it's picture
Upload ./training.log with huggingface_hub
7579959
2023-11-16 06:11:33,784 ----------------------------------------------------------------------------------------------------
2023-11-16 06:11:33,786 Model: "SequenceTagger(
(embeddings): TransformerWordEmbeddings(
(model): XLMRobertaModel(
(embeddings): XLMRobertaEmbeddings(
(word_embeddings): Embedding(250003, 1024)
(position_embeddings): Embedding(514, 1024, padding_idx=1)
(token_type_embeddings): Embedding(1, 1024)
(LayerNorm): LayerNorm((1024,), eps=1e-05, elementwise_affine=True)
(dropout): Dropout(p=0.1, inplace=False)
)
(encoder): XLMRobertaEncoder(
(layer): ModuleList(
(0-23): 24 x XLMRobertaLayer(
(attention): XLMRobertaAttention(
(self): XLMRobertaSelfAttention(
(query): Linear(in_features=1024, out_features=1024, bias=True)
(key): Linear(in_features=1024, out_features=1024, bias=True)
(value): Linear(in_features=1024, out_features=1024, bias=True)
(dropout): Dropout(p=0.1, inplace=False)
)
(output): XLMRobertaSelfOutput(
(dense): Linear(in_features=1024, out_features=1024, bias=True)
(LayerNorm): LayerNorm((1024,), eps=1e-05, elementwise_affine=True)
(dropout): Dropout(p=0.1, inplace=False)
)
)
(intermediate): XLMRobertaIntermediate(
(dense): Linear(in_features=1024, out_features=4096, bias=True)
(intermediate_act_fn): GELUActivation()
)
(output): XLMRobertaOutput(
(dense): Linear(in_features=4096, out_features=1024, bias=True)
(LayerNorm): LayerNorm((1024,), eps=1e-05, elementwise_affine=True)
(dropout): Dropout(p=0.1, inplace=False)
)
)
)
)
(pooler): XLMRobertaPooler(
(dense): Linear(in_features=1024, out_features=1024, bias=True)
(activation): Tanh()
)
)
)
(locked_dropout): LockedDropout(p=0.5)
(linear): Linear(in_features=1024, out_features=13, bias=True)
(loss_function): CrossEntropyLoss()
)"
2023-11-16 06:11:33,786 ----------------------------------------------------------------------------------------------------
2023-11-16 06:11:33,786 MultiCorpus: 30000 train + 10000 dev + 10000 test sentences
- ColumnCorpus Corpus: 20000 train + 0 dev + 0 test sentences - /root/.flair/datasets/ner_multi_xtreme/en
- ColumnCorpus Corpus: 10000 train + 10000 dev + 10000 test sentences - /root/.flair/datasets/ner_multi_xtreme/ka
2023-11-16 06:11:33,786 ----------------------------------------------------------------------------------------------------
2023-11-16 06:11:33,786 Train: 30000 sentences
2023-11-16 06:11:33,786 (train_with_dev=False, train_with_test=False)
2023-11-16 06:11:33,786 ----------------------------------------------------------------------------------------------------
2023-11-16 06:11:33,786 Training Params:
2023-11-16 06:11:33,786 - learning_rate: "5e-06"
2023-11-16 06:11:33,786 - mini_batch_size: "4"
2023-11-16 06:11:33,786 - max_epochs: "10"
2023-11-16 06:11:33,786 - shuffle: "True"
2023-11-16 06:11:33,786 ----------------------------------------------------------------------------------------------------
2023-11-16 06:11:33,786 Plugins:
2023-11-16 06:11:33,786 - TensorboardLogger
2023-11-16 06:11:33,786 - LinearScheduler | warmup_fraction: '0.1'
2023-11-16 06:11:33,786 ----------------------------------------------------------------------------------------------------
2023-11-16 06:11:33,786 Final evaluation on model from best epoch (best-model.pt)
2023-11-16 06:11:33,787 - metric: "('micro avg', 'f1-score')"
2023-11-16 06:11:33,787 ----------------------------------------------------------------------------------------------------
2023-11-16 06:11:33,787 Computation:
2023-11-16 06:11:33,787 - compute on device: cuda:0
2023-11-16 06:11:33,787 - embedding storage: none
2023-11-16 06:11:33,787 ----------------------------------------------------------------------------------------------------
2023-11-16 06:11:33,787 Model training base path: "autotrain-flair-georgian-ner-xlm_r_large-bs4-e10-lr5e-06-4"
2023-11-16 06:11:33,787 ----------------------------------------------------------------------------------------------------
2023-11-16 06:11:33,787 ----------------------------------------------------------------------------------------------------
2023-11-16 06:11:33,787 Logging anything other than scalars to TensorBoard is currently not supported.
2023-11-16 06:13:08,349 epoch 1 - iter 750/7500 - loss 2.53865900 - time (sec): 94.56 - samples/sec: 253.75 - lr: 0.000000 - momentum: 0.000000
2023-11-16 06:14:42,568 epoch 1 - iter 1500/7500 - loss 2.13967550 - time (sec): 188.78 - samples/sec: 256.47 - lr: 0.000001 - momentum: 0.000000
2023-11-16 06:16:16,468 epoch 1 - iter 2250/7500 - loss 1.90406353 - time (sec): 282.68 - samples/sec: 256.63 - lr: 0.000001 - momentum: 0.000000
2023-11-16 06:17:48,884 epoch 1 - iter 3000/7500 - loss 1.67899229 - time (sec): 375.10 - samples/sec: 256.80 - lr: 0.000002 - momentum: 0.000000
2023-11-16 06:19:19,876 epoch 1 - iter 3750/7500 - loss 1.48518547 - time (sec): 466.09 - samples/sec: 258.11 - lr: 0.000002 - momentum: 0.000000
2023-11-16 06:20:52,111 epoch 1 - iter 4500/7500 - loss 1.33429739 - time (sec): 558.32 - samples/sec: 259.20 - lr: 0.000003 - momentum: 0.000000
2023-11-16 06:22:25,071 epoch 1 - iter 5250/7500 - loss 1.22009996 - time (sec): 651.28 - samples/sec: 258.72 - lr: 0.000003 - momentum: 0.000000
2023-11-16 06:23:57,896 epoch 1 - iter 6000/7500 - loss 1.13315230 - time (sec): 744.11 - samples/sec: 258.55 - lr: 0.000004 - momentum: 0.000000
2023-11-16 06:25:28,930 epoch 1 - iter 6750/7500 - loss 1.06203012 - time (sec): 835.14 - samples/sec: 259.35 - lr: 0.000004 - momentum: 0.000000
2023-11-16 06:27:03,002 epoch 1 - iter 7500/7500 - loss 1.00081782 - time (sec): 929.21 - samples/sec: 259.14 - lr: 0.000005 - momentum: 0.000000
2023-11-16 06:27:03,005 ----------------------------------------------------------------------------------------------------
2023-11-16 06:27:03,005 EPOCH 1 done: loss 1.0008 - lr: 0.000005
2023-11-16 06:27:30,760 DEV : loss 0.3205418884754181 - f1-score (micro avg) 0.7957
2023-11-16 06:27:33,296 saving best model
2023-11-16 06:27:35,260 ----------------------------------------------------------------------------------------------------
2023-11-16 06:29:08,221 epoch 2 - iter 750/7500 - loss 0.40584352 - time (sec): 92.96 - samples/sec: 259.85 - lr: 0.000005 - momentum: 0.000000
2023-11-16 06:30:41,767 epoch 2 - iter 1500/7500 - loss 0.41800053 - time (sec): 186.50 - samples/sec: 258.62 - lr: 0.000005 - momentum: 0.000000
2023-11-16 06:32:13,981 epoch 2 - iter 2250/7500 - loss 0.40515032 - time (sec): 278.72 - samples/sec: 260.55 - lr: 0.000005 - momentum: 0.000000
2023-11-16 06:33:44,785 epoch 2 - iter 3000/7500 - loss 0.40416870 - time (sec): 369.52 - samples/sec: 261.05 - lr: 0.000005 - momentum: 0.000000
2023-11-16 06:35:15,515 epoch 2 - iter 3750/7500 - loss 0.40544240 - time (sec): 460.25 - samples/sec: 263.21 - lr: 0.000005 - momentum: 0.000000
2023-11-16 06:36:48,219 epoch 2 - iter 4500/7500 - loss 0.40263197 - time (sec): 552.96 - samples/sec: 262.37 - lr: 0.000005 - momentum: 0.000000
2023-11-16 06:38:23,766 epoch 2 - iter 5250/7500 - loss 0.39942117 - time (sec): 648.50 - samples/sec: 260.48 - lr: 0.000005 - momentum: 0.000000
2023-11-16 06:39:57,620 epoch 2 - iter 6000/7500 - loss 0.40065088 - time (sec): 742.36 - samples/sec: 259.79 - lr: 0.000005 - momentum: 0.000000
2023-11-16 06:41:29,077 epoch 2 - iter 6750/7500 - loss 0.39965016 - time (sec): 833.81 - samples/sec: 260.34 - lr: 0.000005 - momentum: 0.000000
2023-11-16 06:43:02,733 epoch 2 - iter 7500/7500 - loss 0.39861413 - time (sec): 927.47 - samples/sec: 259.63 - lr: 0.000004 - momentum: 0.000000
2023-11-16 06:43:02,736 ----------------------------------------------------------------------------------------------------
2023-11-16 06:43:02,736 EPOCH 2 done: loss 0.3986 - lr: 0.000004
2023-11-16 06:43:29,322 DEV : loss 0.2607610523700714 - f1-score (micro avg) 0.8643
2023-11-16 06:43:31,142 saving best model
2023-11-16 06:43:33,553 ----------------------------------------------------------------------------------------------------
2023-11-16 06:45:08,034 epoch 3 - iter 750/7500 - loss 0.37315879 - time (sec): 94.48 - samples/sec: 253.04 - lr: 0.000004 - momentum: 0.000000
2023-11-16 06:46:39,743 epoch 3 - iter 1500/7500 - loss 0.35743568 - time (sec): 186.18 - samples/sec: 256.62 - lr: 0.000004 - momentum: 0.000000
2023-11-16 06:48:12,901 epoch 3 - iter 2250/7500 - loss 0.35305153 - time (sec): 279.34 - samples/sec: 259.77 - lr: 0.000004 - momentum: 0.000000
2023-11-16 06:49:45,353 epoch 3 - iter 3000/7500 - loss 0.35234824 - time (sec): 371.79 - samples/sec: 259.85 - lr: 0.000004 - momentum: 0.000000
2023-11-16 06:51:20,378 epoch 3 - iter 3750/7500 - loss 0.35046792 - time (sec): 466.82 - samples/sec: 258.57 - lr: 0.000004 - momentum: 0.000000
2023-11-16 06:52:53,238 epoch 3 - iter 4500/7500 - loss 0.35142197 - time (sec): 559.68 - samples/sec: 259.89 - lr: 0.000004 - momentum: 0.000000
2023-11-16 06:54:23,949 epoch 3 - iter 5250/7500 - loss 0.34665555 - time (sec): 650.39 - samples/sec: 260.41 - lr: 0.000004 - momentum: 0.000000
2023-11-16 06:55:56,860 epoch 3 - iter 6000/7500 - loss 0.35003084 - time (sec): 743.30 - samples/sec: 259.59 - lr: 0.000004 - momentum: 0.000000
2023-11-16 06:57:30,078 epoch 3 - iter 6750/7500 - loss 0.34700719 - time (sec): 836.52 - samples/sec: 259.35 - lr: 0.000004 - momentum: 0.000000
2023-11-16 06:59:02,988 epoch 3 - iter 7500/7500 - loss 0.34834444 - time (sec): 929.43 - samples/sec: 259.08 - lr: 0.000004 - momentum: 0.000000
2023-11-16 06:59:02,990 ----------------------------------------------------------------------------------------------------
2023-11-16 06:59:02,990 EPOCH 3 done: loss 0.3483 - lr: 0.000004
2023-11-16 06:59:30,217 DEV : loss 0.2834814190864563 - f1-score (micro avg) 0.881
2023-11-16 06:59:32,866 saving best model
2023-11-16 06:59:35,803 ----------------------------------------------------------------------------------------------------
2023-11-16 07:01:09,960 epoch 4 - iter 750/7500 - loss 0.29042774 - time (sec): 94.15 - samples/sec: 256.86 - lr: 0.000004 - momentum: 0.000000
2023-11-16 07:02:45,034 epoch 4 - iter 1500/7500 - loss 0.28875226 - time (sec): 189.23 - samples/sec: 258.73 - lr: 0.000004 - momentum: 0.000000
2023-11-16 07:04:20,800 epoch 4 - iter 2250/7500 - loss 0.30241778 - time (sec): 284.99 - samples/sec: 255.97 - lr: 0.000004 - momentum: 0.000000
2023-11-16 07:05:55,249 epoch 4 - iter 3000/7500 - loss 0.30810931 - time (sec): 379.44 - samples/sec: 254.41 - lr: 0.000004 - momentum: 0.000000
2023-11-16 07:07:28,778 epoch 4 - iter 3750/7500 - loss 0.30459660 - time (sec): 472.97 - samples/sec: 255.40 - lr: 0.000004 - momentum: 0.000000
2023-11-16 07:08:59,582 epoch 4 - iter 4500/7500 - loss 0.30550384 - time (sec): 563.77 - samples/sec: 257.73 - lr: 0.000004 - momentum: 0.000000
2023-11-16 07:10:31,165 epoch 4 - iter 5250/7500 - loss 0.30595152 - time (sec): 655.36 - samples/sec: 258.24 - lr: 0.000004 - momentum: 0.000000
2023-11-16 07:12:04,192 epoch 4 - iter 6000/7500 - loss 0.30648476 - time (sec): 748.38 - samples/sec: 258.00 - lr: 0.000003 - momentum: 0.000000
2023-11-16 07:13:38,216 epoch 4 - iter 6750/7500 - loss 0.30712803 - time (sec): 842.41 - samples/sec: 257.62 - lr: 0.000003 - momentum: 0.000000
2023-11-16 07:15:12,386 epoch 4 - iter 7500/7500 - loss 0.30384345 - time (sec): 936.58 - samples/sec: 257.10 - lr: 0.000003 - momentum: 0.000000
2023-11-16 07:15:12,389 ----------------------------------------------------------------------------------------------------
2023-11-16 07:15:12,389 EPOCH 4 done: loss 0.3038 - lr: 0.000003
2023-11-16 07:15:39,642 DEV : loss 0.2750042676925659 - f1-score (micro avg) 0.8871
2023-11-16 07:15:41,637 saving best model
2023-11-16 07:15:44,075 ----------------------------------------------------------------------------------------------------
2023-11-16 07:17:17,606 epoch 5 - iter 750/7500 - loss 0.22837945 - time (sec): 93.53 - samples/sec: 253.79 - lr: 0.000003 - momentum: 0.000000
2023-11-16 07:18:50,674 epoch 5 - iter 1500/7500 - loss 0.24801582 - time (sec): 186.59 - samples/sec: 255.33 - lr: 0.000003 - momentum: 0.000000
2023-11-16 07:20:23,354 epoch 5 - iter 2250/7500 - loss 0.24364625 - time (sec): 279.27 - samples/sec: 258.70 - lr: 0.000003 - momentum: 0.000000
2023-11-16 07:21:53,572 epoch 5 - iter 3000/7500 - loss 0.25086533 - time (sec): 369.49 - samples/sec: 261.28 - lr: 0.000003 - momentum: 0.000000
2023-11-16 07:23:27,878 epoch 5 - iter 3750/7500 - loss 0.25125342 - time (sec): 463.80 - samples/sec: 260.45 - lr: 0.000003 - momentum: 0.000000
2023-11-16 07:24:59,899 epoch 5 - iter 4500/7500 - loss 0.25211752 - time (sec): 555.82 - samples/sec: 259.74 - lr: 0.000003 - momentum: 0.000000
2023-11-16 07:26:31,650 epoch 5 - iter 5250/7500 - loss 0.25096563 - time (sec): 647.57 - samples/sec: 259.86 - lr: 0.000003 - momentum: 0.000000
2023-11-16 07:28:06,382 epoch 5 - iter 6000/7500 - loss 0.25437307 - time (sec): 742.30 - samples/sec: 258.90 - lr: 0.000003 - momentum: 0.000000
2023-11-16 07:29:38,647 epoch 5 - iter 6750/7500 - loss 0.25716650 - time (sec): 834.57 - samples/sec: 259.17 - lr: 0.000003 - momentum: 0.000000
2023-11-16 07:31:13,423 epoch 5 - iter 7500/7500 - loss 0.25526851 - time (sec): 929.34 - samples/sec: 259.10 - lr: 0.000003 - momentum: 0.000000
2023-11-16 07:31:13,426 ----------------------------------------------------------------------------------------------------
2023-11-16 07:31:13,427 EPOCH 5 done: loss 0.2553 - lr: 0.000003
2023-11-16 07:31:40,891 DEV : loss 0.2662450671195984 - f1-score (micro avg) 0.8974
2023-11-16 07:31:43,349 saving best model
2023-11-16 07:31:46,083 ----------------------------------------------------------------------------------------------------
2023-11-16 07:33:16,627 epoch 6 - iter 750/7500 - loss 0.19587155 - time (sec): 90.54 - samples/sec: 263.71 - lr: 0.000003 - momentum: 0.000000
2023-11-16 07:34:47,310 epoch 6 - iter 1500/7500 - loss 0.20788294 - time (sec): 181.22 - samples/sec: 265.84 - lr: 0.000003 - momentum: 0.000000
2023-11-16 07:36:21,324 epoch 6 - iter 2250/7500 - loss 0.20608536 - time (sec): 275.24 - samples/sec: 264.05 - lr: 0.000003 - momentum: 0.000000
2023-11-16 07:37:54,486 epoch 6 - iter 3000/7500 - loss 0.21411200 - time (sec): 368.40 - samples/sec: 261.45 - lr: 0.000003 - momentum: 0.000000
2023-11-16 07:39:27,386 epoch 6 - iter 3750/7500 - loss 0.21815036 - time (sec): 461.30 - samples/sec: 260.18 - lr: 0.000003 - momentum: 0.000000
2023-11-16 07:41:00,912 epoch 6 - iter 4500/7500 - loss 0.21725635 - time (sec): 554.83 - samples/sec: 260.18 - lr: 0.000002 - momentum: 0.000000
2023-11-16 07:42:34,526 epoch 6 - iter 5250/7500 - loss 0.21942273 - time (sec): 648.44 - samples/sec: 259.04 - lr: 0.000002 - momentum: 0.000000
2023-11-16 07:44:07,951 epoch 6 - iter 6000/7500 - loss 0.22107059 - time (sec): 741.87 - samples/sec: 258.58 - lr: 0.000002 - momentum: 0.000000
2023-11-16 07:45:41,437 epoch 6 - iter 6750/7500 - loss 0.22258724 - time (sec): 835.35 - samples/sec: 258.83 - lr: 0.000002 - momentum: 0.000000
2023-11-16 07:47:12,427 epoch 6 - iter 7500/7500 - loss 0.22153847 - time (sec): 926.34 - samples/sec: 259.94 - lr: 0.000002 - momentum: 0.000000
2023-11-16 07:47:12,430 ----------------------------------------------------------------------------------------------------
2023-11-16 07:47:12,430 EPOCH 6 done: loss 0.2215 - lr: 0.000002
2023-11-16 07:47:39,790 DEV : loss 0.2961623966693878 - f1-score (micro avg) 0.9003
2023-11-16 07:47:42,072 saving best model
2023-11-16 07:47:44,511 ----------------------------------------------------------------------------------------------------
2023-11-16 07:49:18,880 epoch 7 - iter 750/7500 - loss 0.16803306 - time (sec): 94.36 - samples/sec: 255.20 - lr: 0.000002 - momentum: 0.000000
2023-11-16 07:50:54,973 epoch 7 - iter 1500/7500 - loss 0.17324952 - time (sec): 190.46 - samples/sec: 254.75 - lr: 0.000002 - momentum: 0.000000
2023-11-16 07:52:32,192 epoch 7 - iter 2250/7500 - loss 0.17809510 - time (sec): 287.68 - samples/sec: 251.96 - lr: 0.000002 - momentum: 0.000000
2023-11-16 07:54:06,050 epoch 7 - iter 3000/7500 - loss 0.18157709 - time (sec): 381.53 - samples/sec: 252.63 - lr: 0.000002 - momentum: 0.000000
2023-11-16 07:55:39,777 epoch 7 - iter 3750/7500 - loss 0.18010115 - time (sec): 475.26 - samples/sec: 252.98 - lr: 0.000002 - momentum: 0.000000
2023-11-16 07:57:12,748 epoch 7 - iter 4500/7500 - loss 0.18253655 - time (sec): 568.23 - samples/sec: 253.84 - lr: 0.000002 - momentum: 0.000000
2023-11-16 07:58:45,045 epoch 7 - iter 5250/7500 - loss 0.18478993 - time (sec): 660.53 - samples/sec: 254.99 - lr: 0.000002 - momentum: 0.000000
2023-11-16 08:00:17,924 epoch 7 - iter 6000/7500 - loss 0.18257351 - time (sec): 753.41 - samples/sec: 255.75 - lr: 0.000002 - momentum: 0.000000
2023-11-16 08:01:49,343 epoch 7 - iter 6750/7500 - loss 0.18422323 - time (sec): 844.83 - samples/sec: 256.40 - lr: 0.000002 - momentum: 0.000000
2023-11-16 08:03:22,527 epoch 7 - iter 7500/7500 - loss 0.18484974 - time (sec): 938.01 - samples/sec: 256.71 - lr: 0.000002 - momentum: 0.000000
2023-11-16 08:03:22,531 ----------------------------------------------------------------------------------------------------
2023-11-16 08:03:22,531 EPOCH 7 done: loss 0.1848 - lr: 0.000002
2023-11-16 08:03:48,970 DEV : loss 0.305960088968277 - f1-score (micro avg) 0.9028
2023-11-16 08:03:51,887 saving best model
2023-11-16 08:03:53,942 ----------------------------------------------------------------------------------------------------
2023-11-16 08:05:28,806 epoch 8 - iter 750/7500 - loss 0.14648152 - time (sec): 94.86 - samples/sec: 244.77 - lr: 0.000002 - momentum: 0.000000
2023-11-16 08:07:04,823 epoch 8 - iter 1500/7500 - loss 0.15989226 - time (sec): 190.88 - samples/sec: 250.15 - lr: 0.000002 - momentum: 0.000000
2023-11-16 08:08:37,702 epoch 8 - iter 2250/7500 - loss 0.16196706 - time (sec): 283.76 - samples/sec: 254.57 - lr: 0.000002 - momentum: 0.000000
2023-11-16 08:10:09,006 epoch 8 - iter 3000/7500 - loss 0.16121972 - time (sec): 375.06 - samples/sec: 257.26 - lr: 0.000001 - momentum: 0.000000
2023-11-16 08:11:41,390 epoch 8 - iter 3750/7500 - loss 0.15974733 - time (sec): 467.45 - samples/sec: 257.59 - lr: 0.000001 - momentum: 0.000000
2023-11-16 08:13:14,066 epoch 8 - iter 4500/7500 - loss 0.15727904 - time (sec): 560.12 - samples/sec: 258.26 - lr: 0.000001 - momentum: 0.000000
2023-11-16 08:14:47,177 epoch 8 - iter 5250/7500 - loss 0.15597106 - time (sec): 653.23 - samples/sec: 257.92 - lr: 0.000001 - momentum: 0.000000
2023-11-16 08:16:17,918 epoch 8 - iter 6000/7500 - loss 0.15441827 - time (sec): 743.97 - samples/sec: 258.11 - lr: 0.000001 - momentum: 0.000000
2023-11-16 08:17:52,222 epoch 8 - iter 6750/7500 - loss 0.15283100 - time (sec): 838.28 - samples/sec: 258.17 - lr: 0.000001 - momentum: 0.000000
2023-11-16 08:19:25,488 epoch 8 - iter 7500/7500 - loss 0.15507668 - time (sec): 931.54 - samples/sec: 258.49 - lr: 0.000001 - momentum: 0.000000
2023-11-16 08:19:25,491 ----------------------------------------------------------------------------------------------------
2023-11-16 08:19:25,491 EPOCH 8 done: loss 0.1551 - lr: 0.000001
2023-11-16 08:19:53,344 DEV : loss 0.3231204152107239 - f1-score (micro avg) 0.9014
2023-11-16 08:19:55,450 ----------------------------------------------------------------------------------------------------
2023-11-16 08:21:28,400 epoch 9 - iter 750/7500 - loss 0.12523890 - time (sec): 92.95 - samples/sec: 258.76 - lr: 0.000001 - momentum: 0.000000
2023-11-16 08:23:00,060 epoch 9 - iter 1500/7500 - loss 0.12801485 - time (sec): 184.61 - samples/sec: 263.03 - lr: 0.000001 - momentum: 0.000000
2023-11-16 08:24:36,577 epoch 9 - iter 2250/7500 - loss 0.13158450 - time (sec): 281.12 - samples/sec: 255.88 - lr: 0.000001 - momentum: 0.000000
2023-11-16 08:26:08,372 epoch 9 - iter 3000/7500 - loss 0.12955430 - time (sec): 372.92 - samples/sec: 256.96 - lr: 0.000001 - momentum: 0.000000
2023-11-16 08:27:42,811 epoch 9 - iter 3750/7500 - loss 0.13110177 - time (sec): 467.36 - samples/sec: 256.70 - lr: 0.000001 - momentum: 0.000000
2023-11-16 08:29:15,844 epoch 9 - iter 4500/7500 - loss 0.13696235 - time (sec): 560.39 - samples/sec: 256.51 - lr: 0.000001 - momentum: 0.000000
2023-11-16 08:30:48,381 epoch 9 - iter 5250/7500 - loss 0.13444283 - time (sec): 652.93 - samples/sec: 256.96 - lr: 0.000001 - momentum: 0.000000
2023-11-16 08:32:21,725 epoch 9 - iter 6000/7500 - loss 0.13580845 - time (sec): 746.27 - samples/sec: 258.30 - lr: 0.000001 - momentum: 0.000000
2023-11-16 08:33:54,750 epoch 9 - iter 6750/7500 - loss 0.13419816 - time (sec): 839.30 - samples/sec: 258.02 - lr: 0.000001 - momentum: 0.000000
2023-11-16 08:35:28,480 epoch 9 - iter 7500/7500 - loss 0.13459907 - time (sec): 933.03 - samples/sec: 258.08 - lr: 0.000001 - momentum: 0.000000
2023-11-16 08:35:28,482 ----------------------------------------------------------------------------------------------------
2023-11-16 08:35:28,483 EPOCH 9 done: loss 0.1346 - lr: 0.000001
2023-11-16 08:35:55,875 DEV : loss 0.3105945885181427 - f1-score (micro avg) 0.9036
2023-11-16 08:35:58,080 saving best model
2023-11-16 08:36:01,035 ----------------------------------------------------------------------------------------------------
2023-11-16 08:37:36,044 epoch 10 - iter 750/7500 - loss 0.10551800 - time (sec): 95.01 - samples/sec: 248.68 - lr: 0.000001 - momentum: 0.000000
2023-11-16 08:39:09,059 epoch 10 - iter 1500/7500 - loss 0.11970928 - time (sec): 188.02 - samples/sec: 251.27 - lr: 0.000000 - momentum: 0.000000
2023-11-16 08:40:42,311 epoch 10 - iter 2250/7500 - loss 0.12199666 - time (sec): 281.27 - samples/sec: 256.11 - lr: 0.000000 - momentum: 0.000000
2023-11-16 08:42:13,894 epoch 10 - iter 3000/7500 - loss 0.12112190 - time (sec): 372.86 - samples/sec: 257.41 - lr: 0.000000 - momentum: 0.000000
2023-11-16 08:43:44,971 epoch 10 - iter 3750/7500 - loss 0.12198423 - time (sec): 463.93 - samples/sec: 259.71 - lr: 0.000000 - momentum: 0.000000
2023-11-16 08:45:19,480 epoch 10 - iter 4500/7500 - loss 0.11644070 - time (sec): 558.44 - samples/sec: 259.28 - lr: 0.000000 - momentum: 0.000000
2023-11-16 08:46:52,528 epoch 10 - iter 5250/7500 - loss 0.12094725 - time (sec): 651.49 - samples/sec: 259.32 - lr: 0.000000 - momentum: 0.000000
2023-11-16 08:48:29,025 epoch 10 - iter 6000/7500 - loss 0.11921992 - time (sec): 747.99 - samples/sec: 257.59 - lr: 0.000000 - momentum: 0.000000
2023-11-16 08:50:06,215 epoch 10 - iter 6750/7500 - loss 0.11723856 - time (sec): 845.18 - samples/sec: 256.47 - lr: 0.000000 - momentum: 0.000000
2023-11-16 08:51:43,737 epoch 10 - iter 7500/7500 - loss 0.11691516 - time (sec): 942.70 - samples/sec: 255.43 - lr: 0.000000 - momentum: 0.000000
2023-11-16 08:51:43,740 ----------------------------------------------------------------------------------------------------
2023-11-16 08:51:43,740 EPOCH 10 done: loss 0.1169 - lr: 0.000000
2023-11-16 08:52:11,462 DEV : loss 0.3263167440891266 - f1-score (micro avg) 0.905
2023-11-16 08:52:14,084 saving best model
2023-11-16 08:52:19,334 ----------------------------------------------------------------------------------------------------
2023-11-16 08:52:19,337 Loading model from best epoch ...
2023-11-16 08:52:29,363 SequenceTagger predicts: Dictionary with 13 tags: O, S-LOC, B-LOC, E-LOC, I-LOC, S-ORG, B-ORG, E-ORG, I-ORG, S-PER, B-PER, E-PER, I-PER
2023-11-16 08:52:58,009
Results:
- F-score (micro) 0.9038
- F-score (macro) 0.9028
- Accuracy 0.8536
By class:
precision recall f1-score support
LOC 0.9015 0.9153 0.9083 5288
PER 0.9219 0.9417 0.9317 3962
ORG 0.8674 0.8692 0.8683 3807
micro avg 0.8979 0.9099 0.9038 13057
macro avg 0.8969 0.9087 0.9028 13057
weighted avg 0.8977 0.9099 0.9037 13057
2023-11-16 08:52:58,009 ----------------------------------------------------------------------------------------------------