xlm-roberta-large-bs-16-lr-5e-05-ep-1-wp-0.1-gacc-8-gnm-1.0-FP16-mx-512-v0.1

This model is a fine-tuned version of FacebookAI/xlm-roberta-large on an unknown dataset. It achieves the following results on the evaluation set:

  • Loss: 2.3291

Model description

More information needed

Intended uses & limitations

More information needed

Training and evaluation data

More information needed

Training procedure

Training hyperparameters

The following hyperparameters were used during training:

  • learning_rate: 5e-05
  • train_batch_size: 16
  • eval_batch_size: 16
  • seed: 42
  • gradient_accumulation_steps: 8
  • total_train_batch_size: 128
  • optimizer: Use OptimizerNames.ADAMW_TORCH_FUSED with betas=(0.9,0.999) and epsilon=1e-08 and optimizer_args=No additional optimizer arguments
  • lr_scheduler_type: linear
  • lr_scheduler_warmup_ratio: 0.1
  • num_epochs: 1

Training results

Training Loss Epoch Step Validation Loss
17.8101 0.0055 50 5.0950
17.0019 0.0109 100 4.5201
15.6548 0.0164 150 4.3384
15.1777 0.0219 200 4.1739
15.6084 0.0273 250 nan
14.3951 0.0328 300 3.8169
14.184 0.0382 350 3.6496
14.157 0.0437 400 3.6370
13.78 0.0492 450 3.5758
13.7744 0.0546 500 3.4703
13.5207 0.0601 550 3.7431
12.8285 0.0656 600 3.4274
13.804 0.0710 650 3.2934
13.1016 0.0765 700 3.2929
12.9295 0.0819 750 3.2609
12.9975 0.0874 800 3.2154
12.8213 0.0929 850 3.1314
12.8268 0.0983 900 3.2730
12.9578 0.1038 950 3.1037
13.3053 0.1093 1000 3.1316
12.7619 0.1147 1050 3.0334
12.7588 0.1202 1100 3.0158
13.0244 0.1256 1150 nan
12.5915 0.1311 1200 3.0654
11.9712 0.1366 1250 2.9577
12.3582 0.1420 1300 2.9230
11.5631 0.1475 1350 2.9535
12.2369 0.1530 1400 3.0215
12.1179 0.1584 1450 2.8922
12.2686 0.1639 1500 2.8579
11.84 0.1694 1550 2.9253
11.6617 0.1748 1600 2.8686
11.6284 0.1803 1650 2.9694
11.8075 0.1857 1700 2.8212
11.5036 0.1912 1750 2.9165
11.5307 0.1967 1800 2.7684
11.7167 0.2021 1850 nan
12.3351 0.2076 1900 2.8803
11.8514 0.2131 1950 2.7556
11.9989 0.2185 2000 2.7531
12.0212 0.2240 2050 2.7811
11.5154 0.2294 2100 2.8039
11.8633 0.2349 2150 2.8538
11.5177 0.2404 2200 2.7256
11.5939 0.2458 2250 2.7699
11.6772 0.2513 2300 2.6950
11.2238 0.2568 2350 2.7304
11.1286 0.2622 2400 2.6525
11.7324 0.2677 2450 2.7490
10.8508 0.2731 2500 2.7722
10.8564 0.2786 2550 2.6763
11.4515 0.2841 2600 2.8086
11.0676 0.2895 2650 2.6937
11.21 0.2950 2700 2.7150
11.1875 0.3005 2750 nan
10.9272 0.3059 2800 2.7153
11.3898 0.3114 2850 2.7387
10.8959 0.3169 2900 2.7029
11.4243 0.3223 2950 2.6342
11.0173 0.3278 3000 nan
10.3994 0.3332 3050 2.4888
10.9072 0.3387 3100 2.6332
11.1628 0.3442 3150 2.6375
10.8527 0.3496 3200 2.5704
11.0833 0.3551 3250 2.6602
10.5689 0.3606 3300 2.5335
10.8759 0.3660 3350 2.5575
10.489 0.3715 3400 2.5462
10.7414 0.3769 3450 2.6745
10.8202 0.3824 3500 nan
10.7027 0.3879 3550 2.5444
11.4548 0.3933 3600 2.6391
10.4279 0.3988 3650 2.5813
10.726 0.4043 3700 2.5973
10.1897 0.4097 3750 2.5719
10.5646 0.4152 3800 2.6626
11.0231 0.4207 3850 2.5786
11.0557 0.4261 3900 2.6770
10.3466 0.4316 3950 2.5352
10.8437 0.4370 4000 2.7082
10.6587 0.4425 4050 nan
9.9986 0.4480 4100 2.5620
10.8423 0.4534 4150 2.5181
10.8241 0.4589 4200 2.5662
10.4568 0.4644 4250 2.5753
10.0628 0.4698 4300 2.5147
10.9293 0.4753 4350 2.5583
10.6637 0.4807 4400 2.4761
10.495 0.4862 4450 2.5910
10.2338 0.4917 4500 2.5183
10.4056 0.4971 4550 2.5513
10.373 0.5026 4600 2.4892
10.223 0.5081 4650 nan
10.7237 0.5135 4700 2.4571
10.473 0.5190 4750 2.5045
10.3394 0.5244 4800 nan
9.9574 0.5299 4850 2.4845
10.7453 0.5354 4900 nan
10.0733 0.5408 4950 2.5105
9.8847 0.5463 5000 2.5298
10.5273 0.5518 5050 2.5251
10.1006 0.5572 5100 2.5891
10.208 0.5627 5150 2.5482
9.9471 0.5682 5200 2.5731
10.2092 0.5736 5250 2.5134
9.8496 0.5791 5300 2.5534
10.1939 0.5845 5350 2.4982
10.1636 0.5900 5400 2.4370
9.962 0.5955 5450 2.4945
10.3635 0.6009 5500 2.5168
9.754 0.6064 5550 2.5053
10.2112 0.6119 5600 2.4416
9.9659 0.6173 5650 2.5780
9.6756 0.6228 5700 2.4121
9.9777 0.6282 5750 2.4450
9.9441 0.6337 5800 2.4634
10.4017 0.6392 5850 2.5407
10.0558 0.6446 5900 2.4228
9.7832 0.6501 5950 2.4340
9.9771 0.6556 6000 2.4906
9.4138 0.6610 6050 2.5171
10.2916 0.6665 6100 2.4348
9.8759 0.6719 6150 2.3867
9.9418 0.6774 6200 2.3981
9.6188 0.6829 6250 2.4660
9.8974 0.6883 6300 2.4299
10.0928 0.6938 6350 2.4024
9.9564 0.6993 6400 2.4812
9.7911 0.7047 6450 2.3437
10.3234 0.7102 6500 2.4240
9.8974 0.7157 6550 2.5699
9.2776 0.7211 6600 2.4354
9.7232 0.7266 6650 2.3804
10.05 0.7320 6700 2.4174
9.6149 0.7375 6750 2.4039
10.0379 0.7430 6800 2.5200
10.1982 0.7484 6850 2.4522
10.0545 0.7539 6900 2.4185
9.5577 0.7594 6950 nan
10.6035 0.7648 7000 2.3955
9.7875 0.7703 7050 nan
9.8262 0.7757 7100 2.4640
9.4249 0.7812 7150 2.3711
9.573 0.7867 7200 2.3369
9.5382 0.7921 7250 2.4253
9.4487 0.7976 7300 2.3971
9.6848 0.8031 7350 2.5155
9.1989 0.8085 7400 nan
9.1517 0.8140 7450 2.4483
10.0034 0.8194 7500 2.4458
9.2463 0.8249 7550 2.4025
9.8742 0.8304 7600 2.4496
9.8066 0.8358 7650 2.4838
9.2467 0.8413 7700 2.3789
9.6915 0.8468 7750 2.4223
9.9683 0.8522 7800 2.3724
9.5033 0.8577 7850 2.2997
9.4444 0.8632 7900 2.3901
9.5059 0.8686 7950 2.3708
9.513 0.8741 8000 2.3695
9.3093 0.8795 8050 2.4197
9.2414 0.8850 8100 2.4257
9.0852 0.8905 8150 2.3838
9.7345 0.8959 8200 2.4002
9.2903 0.9014 8250 2.3707
9.5652 0.9069 8300 2.3025
9.2533 0.9123 8350 2.3738
9.5378 0.9178 8400 2.4080
9.4812 0.9232 8450 2.4775
9.6664 0.9287 8500 2.3231
9.9709 0.9342 8550 2.3560
9.5003 0.9396 8600 2.3892
9.183 0.9451 8650 2.3027
9.4163 0.9506 8700 2.4888
10.2318 0.9560 8750 2.2755
9.6414 0.9615 8800 2.2422
9.2835 0.9669 8850 2.4216
9.5811 0.9724 8900 2.3790
9.0775 0.9779 8950 2.2990
9.3801 0.9833 9000 2.3856
9.5136 0.9888 9050 nan
9.6601 0.9943 9100 2.3805
9.7597 0.9997 9150 2.3291

Framework versions

  • Transformers 4.47.1
  • Pytorch 2.5.1+cu124
  • Datasets 3.1.0
  • Tokenizers 0.21.0
Downloads last month
3
Safetensors
Model size
560M params
Tensor type
F32
ยท
Inference Providers NEW
This model isn't deployed by any Inference Provider. ๐Ÿ™‹ Ask for provider support

Model tree for BounharAbdelaziz/xlm-roberta-large-bs-16-lr-5e-05-ep-1-wp-0.1-gacc-8-gnm-1.0-FP16-mx-512-v0.1

Finetuned
(444)
this model