2023-11-16 00:45:31,082 ---------------------------------------------------------------------------------------------------- 2023-11-16 00:45:31,084 Model: "SequenceTagger( (embeddings): TransformerWordEmbeddings( (model): XLMRobertaModel( (embeddings): XLMRobertaEmbeddings( (word_embeddings): Embedding(250003, 1024) (position_embeddings): Embedding(514, 1024, padding_idx=1) (token_type_embeddings): Embedding(1, 1024) (LayerNorm): LayerNorm((1024,), eps=1e-05, elementwise_affine=True) (dropout): Dropout(p=0.1, inplace=False) ) (encoder): XLMRobertaEncoder( (layer): ModuleList( (0-23): 24 x XLMRobertaLayer( (attention): XLMRobertaAttention( (self): XLMRobertaSelfAttention( (query): Linear(in_features=1024, out_features=1024, bias=True) (key): Linear(in_features=1024, out_features=1024, bias=True) (value): Linear(in_features=1024, out_features=1024, bias=True) (dropout): Dropout(p=0.1, inplace=False) ) (output): XLMRobertaSelfOutput( (dense): Linear(in_features=1024, out_features=1024, bias=True) (LayerNorm): LayerNorm((1024,), eps=1e-05, elementwise_affine=True) (dropout): Dropout(p=0.1, inplace=False) ) ) (intermediate): XLMRobertaIntermediate( (dense): Linear(in_features=1024, out_features=4096, bias=True) (intermediate_act_fn): GELUActivation() ) (output): XLMRobertaOutput( (dense): Linear(in_features=4096, out_features=1024, bias=True) (LayerNorm): LayerNorm((1024,), eps=1e-05, elementwise_affine=True) (dropout): Dropout(p=0.1, inplace=False) ) ) ) ) (pooler): XLMRobertaPooler( (dense): Linear(in_features=1024, out_features=1024, bias=True) (activation): Tanh() ) ) ) (locked_dropout): LockedDropout(p=0.5) (linear): Linear(in_features=1024, out_features=13, bias=True) (loss_function): CrossEntropyLoss() )" 2023-11-16 00:45:31,084 ---------------------------------------------------------------------------------------------------- 2023-11-16 00:45:31,084 MultiCorpus: 30000 train + 10000 dev + 10000 test sentences - ColumnCorpus Corpus: 20000 train + 0 dev + 0 test sentences - /root/.flair/datasets/ner_multi_xtreme/en - ColumnCorpus Corpus: 10000 train + 10000 dev + 10000 test sentences - /root/.flair/datasets/ner_multi_xtreme/ka 2023-11-16 00:45:31,084 ---------------------------------------------------------------------------------------------------- 2023-11-16 00:45:31,084 Train: 30000 sentences 2023-11-16 00:45:31,084 (train_with_dev=False, train_with_test=False) 2023-11-16 00:45:31,084 ---------------------------------------------------------------------------------------------------- 2023-11-16 00:45:31,084 Training Params: 2023-11-16 00:45:31,084 - learning_rate: "5e-06" 2023-11-16 00:45:31,084 - mini_batch_size: "4" 2023-11-16 00:45:31,084 - max_epochs: "10" 2023-11-16 00:45:31,084 - shuffle: "True" 2023-11-16 00:45:31,084 ---------------------------------------------------------------------------------------------------- 2023-11-16 00:45:31,084 Plugins: 2023-11-16 00:45:31,084 - TensorboardLogger 2023-11-16 00:45:31,084 - LinearScheduler | warmup_fraction: '0.1' 2023-11-16 00:45:31,084 ---------------------------------------------------------------------------------------------------- 2023-11-16 00:45:31,084 Final evaluation on model from best epoch (best-model.pt) 2023-11-16 00:45:31,085 - metric: "('micro avg', 'f1-score')" 2023-11-16 00:45:31,085 ---------------------------------------------------------------------------------------------------- 2023-11-16 00:45:31,085 Computation: 2023-11-16 00:45:31,085 - compute on device: cuda:0 2023-11-16 00:45:31,085 - embedding storage: none 2023-11-16 00:45:31,085 ---------------------------------------------------------------------------------------------------- 2023-11-16 00:45:31,085 Model training base path: "autotrain-flair-georgian-ner-xlm_r_large-bs4-e10-lr5e-06-2" 2023-11-16 00:45:31,085 ---------------------------------------------------------------------------------------------------- 2023-11-16 00:45:31,085 ---------------------------------------------------------------------------------------------------- 2023-11-16 00:45:31,085 Logging anything other than scalars to TensorBoard is currently not supported. 2023-11-16 00:47:04,086 epoch 1 - iter 750/7500 - loss 3.20421711 - time (sec): 93.00 - samples/sec: 254.76 - lr: 0.000000 - momentum: 0.000000 2023-11-16 00:48:36,937 epoch 1 - iter 1500/7500 - loss 2.53046773 - time (sec): 185.85 - samples/sec: 256.83 - lr: 0.000001 - momentum: 0.000000 2023-11-16 00:50:07,872 epoch 1 - iter 2250/7500 - loss 2.16462133 - time (sec): 276.79 - samples/sec: 258.95 - lr: 0.000001 - momentum: 0.000000 2023-11-16 00:51:39,794 epoch 1 - iter 3000/7500 - loss 1.89258823 - time (sec): 368.71 - samples/sec: 258.86 - lr: 0.000002 - momentum: 0.000000 2023-11-16 00:53:11,581 epoch 1 - iter 3750/7500 - loss 1.66150429 - time (sec): 460.49 - samples/sec: 259.63 - lr: 0.000002 - momentum: 0.000000 2023-11-16 00:54:42,341 epoch 1 - iter 4500/7500 - loss 1.47974045 - time (sec): 551.25 - samples/sec: 261.00 - lr: 0.000003 - momentum: 0.000000 2023-11-16 00:56:14,348 epoch 1 - iter 5250/7500 - loss 1.33779802 - time (sec): 643.26 - samples/sec: 261.74 - lr: 0.000003 - momentum: 0.000000 2023-11-16 00:57:44,891 epoch 1 - iter 6000/7500 - loss 1.23224942 - time (sec): 733.80 - samples/sec: 262.47 - lr: 0.000004 - momentum: 0.000000 2023-11-16 00:59:17,561 epoch 1 - iter 6750/7500 - loss 1.14528322 - time (sec): 826.47 - samples/sec: 262.22 - lr: 0.000004 - momentum: 0.000000 2023-11-16 01:00:50,271 epoch 1 - iter 7500/7500 - loss 1.07465936 - time (sec): 919.18 - samples/sec: 261.97 - lr: 0.000005 - momentum: 0.000000 2023-11-16 01:00:50,273 ---------------------------------------------------------------------------------------------------- 2023-11-16 01:00:50,273 EPOCH 1 done: loss 1.0747 - lr: 0.000005 2023-11-16 01:01:17,256 DEV : loss 0.2856157124042511 - f1-score (micro avg) 0.8045 2023-11-16 01:01:18,998 saving best model 2023-11-16 01:01:20,796 ---------------------------------------------------------------------------------------------------- 2023-11-16 01:02:54,269 epoch 2 - iter 750/7500 - loss 0.40158514 - time (sec): 93.47 - samples/sec: 252.67 - lr: 0.000005 - momentum: 0.000000 2023-11-16 01:04:26,600 epoch 2 - iter 1500/7500 - loss 0.40578126 - time (sec): 185.80 - samples/sec: 257.91 - lr: 0.000005 - momentum: 0.000000 2023-11-16 01:06:01,665 epoch 2 - iter 2250/7500 - loss 0.40182467 - time (sec): 280.87 - samples/sec: 256.06 - lr: 0.000005 - momentum: 0.000000 2023-11-16 01:07:35,409 epoch 2 - iter 3000/7500 - loss 0.40425251 - time (sec): 374.61 - samples/sec: 255.46 - lr: 0.000005 - momentum: 0.000000 2023-11-16 01:09:06,889 epoch 2 - iter 3750/7500 - loss 0.40579040 - time (sec): 466.09 - samples/sec: 256.59 - lr: 0.000005 - momentum: 0.000000 2023-11-16 01:10:37,461 epoch 2 - iter 4500/7500 - loss 0.40296543 - time (sec): 556.66 - samples/sec: 257.85 - lr: 0.000005 - momentum: 0.000000 2023-11-16 01:12:10,834 epoch 2 - iter 5250/7500 - loss 0.39886416 - time (sec): 650.03 - samples/sec: 258.59 - lr: 0.000005 - momentum: 0.000000 2023-11-16 01:13:43,058 epoch 2 - iter 6000/7500 - loss 0.40163262 - time (sec): 742.26 - samples/sec: 258.99 - lr: 0.000005 - momentum: 0.000000 2023-11-16 01:15:15,768 epoch 2 - iter 6750/7500 - loss 0.39984468 - time (sec): 834.97 - samples/sec: 259.04 - lr: 0.000005 - momentum: 0.000000 2023-11-16 01:16:48,990 epoch 2 - iter 7500/7500 - loss 0.39673334 - time (sec): 928.19 - samples/sec: 259.43 - lr: 0.000004 - momentum: 0.000000 2023-11-16 01:16:48,992 ---------------------------------------------------------------------------------------------------- 2023-11-16 01:16:48,992 EPOCH 2 done: loss 0.3967 - lr: 0.000004 2023-11-16 01:17:16,238 DEV : loss 0.27984410524368286 - f1-score (micro avg) 0.8635 2023-11-16 01:17:18,468 saving best model 2023-11-16 01:17:21,478 ---------------------------------------------------------------------------------------------------- 2023-11-16 01:18:56,267 epoch 3 - iter 750/7500 - loss 0.34788380 - time (sec): 94.78 - samples/sec: 252.61 - lr: 0.000004 - momentum: 0.000000 2023-11-16 01:20:29,227 epoch 3 - iter 1500/7500 - loss 0.35790619 - time (sec): 187.74 - samples/sec: 257.86 - lr: 0.000004 - momentum: 0.000000 2023-11-16 01:22:00,989 epoch 3 - iter 2250/7500 - loss 0.36275974 - time (sec): 279.51 - samples/sec: 256.05 - lr: 0.000004 - momentum: 0.000000 2023-11-16 01:23:32,112 epoch 3 - iter 3000/7500 - loss 0.35448466 - time (sec): 370.63 - samples/sec: 257.61 - lr: 0.000004 - momentum: 0.000000 2023-11-16 01:25:04,514 epoch 3 - iter 3750/7500 - loss 0.35784162 - time (sec): 463.03 - samples/sec: 258.62 - lr: 0.000004 - momentum: 0.000000 2023-11-16 01:26:37,957 epoch 3 - iter 4500/7500 - loss 0.35491385 - time (sec): 556.48 - samples/sec: 259.13 - lr: 0.000004 - momentum: 0.000000 2023-11-16 01:28:09,894 epoch 3 - iter 5250/7500 - loss 0.35268653 - time (sec): 648.41 - samples/sec: 259.90 - lr: 0.000004 - momentum: 0.000000 2023-11-16 01:29:43,647 epoch 3 - iter 6000/7500 - loss 0.35432380 - time (sec): 742.17 - samples/sec: 259.26 - lr: 0.000004 - momentum: 0.000000 2023-11-16 01:31:15,583 epoch 3 - iter 6750/7500 - loss 0.35073442 - time (sec): 834.10 - samples/sec: 259.69 - lr: 0.000004 - momentum: 0.000000 2023-11-16 01:32:47,991 epoch 3 - iter 7500/7500 - loss 0.34845272 - time (sec): 926.51 - samples/sec: 259.90 - lr: 0.000004 - momentum: 0.000000 2023-11-16 01:32:47,993 ---------------------------------------------------------------------------------------------------- 2023-11-16 01:32:47,993 EPOCH 3 done: loss 0.3485 - lr: 0.000004 2023-11-16 01:33:14,670 DEV : loss 0.2744104862213135 - f1-score (micro avg) 0.8834 2023-11-16 01:33:16,359 saving best model 2023-11-16 01:33:18,657 ---------------------------------------------------------------------------------------------------- 2023-11-16 01:34:49,914 epoch 4 - iter 750/7500 - loss 0.29966012 - time (sec): 91.25 - samples/sec: 265.65 - lr: 0.000004 - momentum: 0.000000 2023-11-16 01:36:21,479 epoch 4 - iter 1500/7500 - loss 0.29512059 - time (sec): 182.82 - samples/sec: 262.70 - lr: 0.000004 - momentum: 0.000000 2023-11-16 01:37:53,047 epoch 4 - iter 2250/7500 - loss 0.29934339 - time (sec): 274.38 - samples/sec: 262.65 - lr: 0.000004 - momentum: 0.000000 2023-11-16 01:39:25,499 epoch 4 - iter 3000/7500 - loss 0.29881097 - time (sec): 366.84 - samples/sec: 263.27 - lr: 0.000004 - momentum: 0.000000 2023-11-16 01:40:56,808 epoch 4 - iter 3750/7500 - loss 0.29543460 - time (sec): 458.15 - samples/sec: 264.24 - lr: 0.000004 - momentum: 0.000000 2023-11-16 01:42:30,166 epoch 4 - iter 4500/7500 - loss 0.29266567 - time (sec): 551.50 - samples/sec: 262.68 - lr: 0.000004 - momentum: 0.000000 2023-11-16 01:44:03,529 epoch 4 - iter 5250/7500 - loss 0.29429266 - time (sec): 644.87 - samples/sec: 262.58 - lr: 0.000004 - momentum: 0.000000 2023-11-16 01:45:36,284 epoch 4 - iter 6000/7500 - loss 0.29201070 - time (sec): 737.62 - samples/sec: 261.99 - lr: 0.000003 - momentum: 0.000000 2023-11-16 01:47:11,437 epoch 4 - iter 6750/7500 - loss 0.29527772 - time (sec): 832.77 - samples/sec: 260.57 - lr: 0.000003 - momentum: 0.000000 2023-11-16 01:48:47,245 epoch 4 - iter 7500/7500 - loss 0.29729031 - time (sec): 928.58 - samples/sec: 259.32 - lr: 0.000003 - momentum: 0.000000 2023-11-16 01:48:47,248 ---------------------------------------------------------------------------------------------------- 2023-11-16 01:48:47,248 EPOCH 4 done: loss 0.2973 - lr: 0.000003 2023-11-16 01:49:14,914 DEV : loss 0.3016064167022705 - f1-score (micro avg) 0.88 2023-11-16 01:49:16,910 ---------------------------------------------------------------------------------------------------- 2023-11-16 01:50:49,496 epoch 5 - iter 750/7500 - loss 0.25377931 - time (sec): 92.58 - samples/sec: 262.37 - lr: 0.000003 - momentum: 0.000000 2023-11-16 01:52:20,605 epoch 5 - iter 1500/7500 - loss 0.25135538 - time (sec): 183.69 - samples/sec: 265.72 - lr: 0.000003 - momentum: 0.000000 2023-11-16 01:53:53,577 epoch 5 - iter 2250/7500 - loss 0.25580791 - time (sec): 276.66 - samples/sec: 262.18 - lr: 0.000003 - momentum: 0.000000 2023-11-16 01:55:26,799 epoch 5 - iter 3000/7500 - loss 0.25964203 - time (sec): 369.89 - samples/sec: 260.86 - lr: 0.000003 - momentum: 0.000000 2023-11-16 01:57:00,788 epoch 5 - iter 3750/7500 - loss 0.26132921 - time (sec): 463.87 - samples/sec: 259.14 - lr: 0.000003 - momentum: 0.000000 2023-11-16 01:58:33,972 epoch 5 - iter 4500/7500 - loss 0.26036055 - time (sec): 557.06 - samples/sec: 258.72 - lr: 0.000003 - momentum: 0.000000 2023-11-16 02:00:09,582 epoch 5 - iter 5250/7500 - loss 0.25878497 - time (sec): 652.67 - samples/sec: 257.81 - lr: 0.000003 - momentum: 0.000000 2023-11-16 02:01:48,359 epoch 5 - iter 6000/7500 - loss 0.25604978 - time (sec): 751.45 - samples/sec: 256.54 - lr: 0.000003 - momentum: 0.000000 2023-11-16 02:03:25,431 epoch 5 - iter 6750/7500 - loss 0.25651438 - time (sec): 848.52 - samples/sec: 255.47 - lr: 0.000003 - momentum: 0.000000 2023-11-16 02:04:59,400 epoch 5 - iter 7500/7500 - loss 0.25488422 - time (sec): 942.49 - samples/sec: 255.49 - lr: 0.000003 - momentum: 0.000000 2023-11-16 02:04:59,403 ---------------------------------------------------------------------------------------------------- 2023-11-16 02:04:59,403 EPOCH 5 done: loss 0.2549 - lr: 0.000003 2023-11-16 02:05:26,452 DEV : loss 0.3108203411102295 - f1-score (micro avg) 0.8923 2023-11-16 02:05:28,547 saving best model 2023-11-16 02:05:31,087 ---------------------------------------------------------------------------------------------------- 2023-11-16 02:07:03,915 epoch 6 - iter 750/7500 - loss 0.20941918 - time (sec): 92.82 - samples/sec: 258.36 - lr: 0.000003 - momentum: 0.000000 2023-11-16 02:08:34,259 epoch 6 - iter 1500/7500 - loss 0.20871668 - time (sec): 183.17 - samples/sec: 261.51 - lr: 0.000003 - momentum: 0.000000 2023-11-16 02:10:06,286 epoch 6 - iter 2250/7500 - loss 0.21719166 - time (sec): 275.20 - samples/sec: 261.07 - lr: 0.000003 - momentum: 0.000000 2023-11-16 02:11:38,611 epoch 6 - iter 3000/7500 - loss 0.22345226 - time (sec): 367.52 - samples/sec: 260.76 - lr: 0.000003 - momentum: 0.000000 2023-11-16 02:13:13,015 epoch 6 - iter 3750/7500 - loss 0.21948790 - time (sec): 461.92 - samples/sec: 260.22 - lr: 0.000003 - momentum: 0.000000 2023-11-16 02:14:48,651 epoch 6 - iter 4500/7500 - loss 0.22195698 - time (sec): 557.56 - samples/sec: 257.83 - lr: 0.000002 - momentum: 0.000000 2023-11-16 02:16:23,096 epoch 6 - iter 5250/7500 - loss 0.22241666 - time (sec): 652.01 - samples/sec: 257.73 - lr: 0.000002 - momentum: 0.000000 2023-11-16 02:17:57,154 epoch 6 - iter 6000/7500 - loss 0.22130923 - time (sec): 746.06 - samples/sec: 258.53 - lr: 0.000002 - momentum: 0.000000 2023-11-16 02:19:29,542 epoch 6 - iter 6750/7500 - loss 0.21994780 - time (sec): 838.45 - samples/sec: 258.60 - lr: 0.000002 - momentum: 0.000000 2023-11-16 02:21:01,165 epoch 6 - iter 7500/7500 - loss 0.21770578 - time (sec): 930.07 - samples/sec: 258.90 - lr: 0.000002 - momentum: 0.000000 2023-11-16 02:21:01,168 ---------------------------------------------------------------------------------------------------- 2023-11-16 02:21:01,168 EPOCH 6 done: loss 0.2177 - lr: 0.000002 2023-11-16 02:21:28,850 DEV : loss 0.31180956959724426 - f1-score (micro avg) 0.8955 2023-11-16 02:21:31,381 saving best model 2023-11-16 02:21:34,381 ---------------------------------------------------------------------------------------------------- 2023-11-16 02:23:08,670 epoch 7 - iter 750/7500 - loss 0.17025570 - time (sec): 94.28 - samples/sec: 253.94 - lr: 0.000002 - momentum: 0.000000 2023-11-16 02:24:44,294 epoch 7 - iter 1500/7500 - loss 0.18032455 - time (sec): 189.91 - samples/sec: 253.38 - lr: 0.000002 - momentum: 0.000000 2023-11-16 02:26:19,207 epoch 7 - iter 2250/7500 - loss 0.18368583 - time (sec): 284.82 - samples/sec: 253.98 - lr: 0.000002 - momentum: 0.000000 2023-11-16 02:27:52,445 epoch 7 - iter 3000/7500 - loss 0.18638293 - time (sec): 378.06 - samples/sec: 254.49 - lr: 0.000002 - momentum: 0.000000 2023-11-16 02:29:25,782 epoch 7 - iter 3750/7500 - loss 0.18144838 - time (sec): 471.40 - samples/sec: 255.30 - lr: 0.000002 - momentum: 0.000000 2023-11-16 02:30:59,415 epoch 7 - iter 4500/7500 - loss 0.18697815 - time (sec): 565.03 - samples/sec: 255.93 - lr: 0.000002 - momentum: 0.000000 2023-11-16 02:32:31,970 epoch 7 - iter 5250/7500 - loss 0.18690520 - time (sec): 657.58 - samples/sec: 256.63 - lr: 0.000002 - momentum: 0.000000 2023-11-16 02:34:05,459 epoch 7 - iter 6000/7500 - loss 0.18389577 - time (sec): 751.07 - samples/sec: 256.38 - lr: 0.000002 - momentum: 0.000000 2023-11-16 02:35:40,741 epoch 7 - iter 6750/7500 - loss 0.18345948 - time (sec): 846.36 - samples/sec: 255.76 - lr: 0.000002 - momentum: 0.000000 2023-11-16 02:37:17,970 epoch 7 - iter 7500/7500 - loss 0.18350743 - time (sec): 943.58 - samples/sec: 255.19 - lr: 0.000002 - momentum: 0.000000 2023-11-16 02:37:17,972 ---------------------------------------------------------------------------------------------------- 2023-11-16 02:37:17,972 EPOCH 7 done: loss 0.1835 - lr: 0.000002 2023-11-16 02:37:45,548 DEV : loss 0.31052064895629883 - f1-score (micro avg) 0.901 2023-11-16 02:37:47,535 saving best model 2023-11-16 02:37:49,958 ---------------------------------------------------------------------------------------------------- 2023-11-16 02:39:25,797 epoch 8 - iter 750/7500 - loss 0.14917987 - time (sec): 95.84 - samples/sec: 255.82 - lr: 0.000002 - momentum: 0.000000 2023-11-16 02:40:58,825 epoch 8 - iter 1500/7500 - loss 0.16554104 - time (sec): 188.86 - samples/sec: 254.74 - lr: 0.000002 - momentum: 0.000000 2023-11-16 02:42:33,026 epoch 8 - iter 2250/7500 - loss 0.16246413 - time (sec): 283.06 - samples/sec: 254.01 - lr: 0.000002 - momentum: 0.000000 2023-11-16 02:44:05,013 epoch 8 - iter 3000/7500 - loss 0.15793136 - time (sec): 375.05 - samples/sec: 254.92 - lr: 0.000001 - momentum: 0.000000 2023-11-16 02:45:37,970 epoch 8 - iter 3750/7500 - loss 0.15705842 - time (sec): 468.01 - samples/sec: 255.77 - lr: 0.000001 - momentum: 0.000000 2023-11-16 02:47:09,872 epoch 8 - iter 4500/7500 - loss 0.15757577 - time (sec): 559.91 - samples/sec: 256.37 - lr: 0.000001 - momentum: 0.000000 2023-11-16 02:48:43,951 epoch 8 - iter 5250/7500 - loss 0.15530409 - time (sec): 653.99 - samples/sec: 256.53 - lr: 0.000001 - momentum: 0.000000 2023-11-16 02:50:15,610 epoch 8 - iter 6000/7500 - loss 0.15633332 - time (sec): 745.65 - samples/sec: 258.01 - lr: 0.000001 - momentum: 0.000000 2023-11-16 02:51:49,707 epoch 8 - iter 6750/7500 - loss 0.15781340 - time (sec): 839.75 - samples/sec: 258.51 - lr: 0.000001 - momentum: 0.000000 2023-11-16 02:53:22,758 epoch 8 - iter 7500/7500 - loss 0.15738201 - time (sec): 932.80 - samples/sec: 258.14 - lr: 0.000001 - momentum: 0.000000 2023-11-16 02:53:22,761 ---------------------------------------------------------------------------------------------------- 2023-11-16 02:53:22,761 EPOCH 8 done: loss 0.1574 - lr: 0.000001 2023-11-16 02:53:49,642 DEV : loss 0.31349387764930725 - f1-score (micro avg) 0.9012 2023-11-16 02:53:51,537 saving best model 2023-11-16 02:53:53,925 ---------------------------------------------------------------------------------------------------- 2023-11-16 02:55:30,620 epoch 9 - iter 750/7500 - loss 0.12454614 - time (sec): 96.69 - samples/sec: 248.06 - lr: 0.000001 - momentum: 0.000000 2023-11-16 02:57:02,449 epoch 9 - iter 1500/7500 - loss 0.12492215 - time (sec): 188.52 - samples/sec: 254.80 - lr: 0.000001 - momentum: 0.000000 2023-11-16 02:58:34,812 epoch 9 - iter 2250/7500 - loss 0.13002767 - time (sec): 280.88 - samples/sec: 257.65 - lr: 0.000001 - momentum: 0.000000 2023-11-16 03:00:06,592 epoch 9 - iter 3000/7500 - loss 0.12936109 - time (sec): 372.66 - samples/sec: 259.06 - lr: 0.000001 - momentum: 0.000000 2023-11-16 03:01:36,302 epoch 9 - iter 3750/7500 - loss 0.12949282 - time (sec): 462.37 - samples/sec: 260.59 - lr: 0.000001 - momentum: 0.000000 2023-11-16 03:03:09,938 epoch 9 - iter 4500/7500 - loss 0.13100535 - time (sec): 556.01 - samples/sec: 259.28 - lr: 0.000001 - momentum: 0.000000 2023-11-16 03:04:42,142 epoch 9 - iter 5250/7500 - loss 0.13241958 - time (sec): 648.21 - samples/sec: 259.22 - lr: 0.000001 - momentum: 0.000000 2023-11-16 03:06:15,744 epoch 9 - iter 6000/7500 - loss 0.13197722 - time (sec): 741.81 - samples/sec: 260.12 - lr: 0.000001 - momentum: 0.000000 2023-11-16 03:07:52,365 epoch 9 - iter 6750/7500 - loss 0.13101789 - time (sec): 838.44 - samples/sec: 258.40 - lr: 0.000001 - momentum: 0.000000 2023-11-16 03:09:30,882 epoch 9 - iter 7500/7500 - loss 0.13192127 - time (sec): 936.95 - samples/sec: 257.00 - lr: 0.000001 - momentum: 0.000000 2023-11-16 03:09:30,885 ---------------------------------------------------------------------------------------------------- 2023-11-16 03:09:30,886 EPOCH 9 done: loss 0.1319 - lr: 0.000001 2023-11-16 03:09:58,661 DEV : loss 0.3276961147785187 - f1-score (micro avg) 0.9002 2023-11-16 03:10:01,082 ---------------------------------------------------------------------------------------------------- 2023-11-16 03:11:36,851 epoch 10 - iter 750/7500 - loss 0.10811317 - time (sec): 95.77 - samples/sec: 255.65 - lr: 0.000001 - momentum: 0.000000 2023-11-16 03:13:11,188 epoch 10 - iter 1500/7500 - loss 0.10879497 - time (sec): 190.10 - samples/sec: 251.97 - lr: 0.000000 - momentum: 0.000000 2023-11-16 03:14:44,600 epoch 10 - iter 2250/7500 - loss 0.11179486 - time (sec): 283.52 - samples/sec: 253.42 - lr: 0.000000 - momentum: 0.000000 2023-11-16 03:16:16,891 epoch 10 - iter 3000/7500 - loss 0.11110255 - time (sec): 375.81 - samples/sec: 256.07 - lr: 0.000000 - momentum: 0.000000 2023-11-16 03:17:51,280 epoch 10 - iter 3750/7500 - loss 0.11617599 - time (sec): 470.20 - samples/sec: 254.90 - lr: 0.000000 - momentum: 0.000000 2023-11-16 03:19:22,531 epoch 10 - iter 4500/7500 - loss 0.11661813 - time (sec): 561.45 - samples/sec: 256.54 - lr: 0.000000 - momentum: 0.000000 2023-11-16 03:20:53,037 epoch 10 - iter 5250/7500 - loss 0.11803804 - time (sec): 651.95 - samples/sec: 257.89 - lr: 0.000000 - momentum: 0.000000 2023-11-16 03:22:24,015 epoch 10 - iter 6000/7500 - loss 0.11722958 - time (sec): 742.93 - samples/sec: 258.60 - lr: 0.000000 - momentum: 0.000000 2023-11-16 03:23:55,236 epoch 10 - iter 6750/7500 - loss 0.11710786 - time (sec): 834.15 - samples/sec: 260.07 - lr: 0.000000 - momentum: 0.000000 2023-11-16 03:25:24,728 epoch 10 - iter 7500/7500 - loss 0.11716487 - time (sec): 923.64 - samples/sec: 260.70 - lr: 0.000000 - momentum: 0.000000 2023-11-16 03:25:24,731 ---------------------------------------------------------------------------------------------------- 2023-11-16 03:25:24,731 EPOCH 10 done: loss 0.1172 - lr: 0.000000 2023-11-16 03:25:51,758 DEV : loss 0.32983964681625366 - f1-score (micro avg) 0.9006 2023-11-16 03:25:55,590 ---------------------------------------------------------------------------------------------------- 2023-11-16 03:25:55,592 Loading model from best epoch ... 2023-11-16 03:26:03,736 SequenceTagger predicts: Dictionary with 13 tags: O, S-LOC, B-LOC, E-LOC, I-LOC, S-ORG, B-ORG, E-ORG, I-ORG, S-PER, B-PER, E-PER, I-PER 2023-11-16 03:26:31,850 Results: - F-score (micro) 0.9027 - F-score (macro) 0.9014 - Accuracy 0.8521 By class: precision recall f1-score support LOC 0.9036 0.9141 0.9088 5288 PER 0.9238 0.9427 0.9332 3962 ORG 0.8593 0.8650 0.8622 3807 micro avg 0.8969 0.9085 0.9027 13057 macro avg 0.8956 0.9073 0.9014 13057 weighted avg 0.8968 0.9085 0.9026 13057 2023-11-16 03:26:31,850 ----------------------------------------------------------------------------------------------------