ST2_modernbert-large_product_V2
This model is a fine-tuned version of answerdotai/ModernBERT-large on the None dataset. It achieves the following results on the evaluation set:
- Loss: 3.4344
- F1: 0.5304
Model description
More information needed
Intended uses & limitations
More information needed
Training and evaluation data
More information needed
Training procedure
Training hyperparameters
The following hyperparameters were used during training:
- learning_rate: 5e-05
- train_batch_size: 36
- eval_batch_size: 16
- seed: 42
- optimizer: Use OptimizerNames.ADAMW_TORCH_FUSED with betas=(0.9,0.999) and epsilon=1e-08 and optimizer_args=No additional optimizer arguments
- lr_scheduler_type: linear
- num_epochs: 200
Training results
Training Loss | Epoch | Step | Validation Loss | F1 |
---|---|---|---|---|
6.5291 | 1.0 | 124 | 6.1481 | 0.0198 |
5.7897 | 2.0 | 248 | 4.1589 | 0.2530 |
3.7526 | 3.0 | 372 | 3.1379 | 0.3966 |
1.9849 | 4.0 | 496 | 2.8741 | 0.4916 |
0.3139 | 5.0 | 620 | 2.8435 | 0.4837 |
0.2204 | 6.0 | 744 | 2.8593 | 0.5099 |
0.1745 | 7.0 | 868 | 3.0514 | 0.5088 |
0.1387 | 8.0 | 992 | 2.8805 | 0.5178 |
0.0734 | 9.0 | 1116 | 2.9076 | 0.5170 |
0.0646 | 10.0 | 1240 | 2.8863 | 0.5319 |
0.0289 | 11.0 | 1364 | 2.9510 | 0.5294 |
0.0443 | 12.0 | 1488 | 2.9005 | 0.5265 |
0.0251 | 13.0 | 1612 | 2.9952 | 0.5139 |
0.048 | 14.0 | 1736 | 2.9175 | 0.5338 |
0.008 | 15.0 | 1860 | 2.8908 | 0.5310 |
0.0137 | 16.0 | 1984 | 2.9082 | 0.5345 |
0.0089 | 17.0 | 2108 | 2.9541 | 0.5363 |
0.0137 | 18.0 | 2232 | 2.9846 | 0.5272 |
0.0136 | 19.0 | 2356 | 3.0461 | 0.5291 |
0.0119 | 20.0 | 2480 | 2.9575 | 0.5231 |
0.0054 | 21.0 | 2604 | 2.9088 | 0.5449 |
0.0071 | 22.0 | 2728 | 3.0477 | 0.5178 |
0.0244 | 23.0 | 2852 | 3.0782 | 0.5169 |
0.0276 | 24.0 | 2976 | 3.1306 | 0.5170 |
0.0665 | 25.0 | 3100 | 3.2074 | 0.5115 |
0.0457 | 26.0 | 3224 | 3.1918 | 0.5156 |
0.0269 | 27.0 | 3348 | 3.0512 | 0.5328 |
0.0118 | 28.0 | 3472 | 3.1252 | 0.5208 |
0.014 | 29.0 | 3596 | 3.1192 | 0.5366 |
0.0036 | 30.0 | 3720 | 3.1020 | 0.5379 |
0.0039 | 31.0 | 3844 | 3.0947 | 0.5419 |
0.0039 | 32.0 | 3968 | 3.1090 | 0.5405 |
0.0035 | 33.0 | 4092 | 3.1003 | 0.5402 |
0.0025 | 34.0 | 4216 | 3.1039 | 0.5432 |
0.0043 | 35.0 | 4340 | 3.1213 | 0.5421 |
0.002 | 36.0 | 4464 | 3.1008 | 0.5378 |
0.0029 | 37.0 | 4588 | 3.1063 | 0.5424 |
0.0024 | 38.0 | 4712 | 3.1267 | 0.5420 |
0.0038 | 39.0 | 4836 | 3.1236 | 0.5419 |
0.0027 | 40.0 | 4960 | 3.1090 | 0.5403 |
0.003 | 41.0 | 5084 | 3.1140 | 0.5405 |
0.003 | 42.0 | 5208 | 3.1305 | 0.5408 |
0.0067 | 43.0 | 5332 | 3.0086 | 0.5453 |
0.0083 | 44.0 | 5456 | 3.1991 | 0.5158 |
0.0394 | 45.0 | 5580 | 3.2579 | 0.5074 |
0.0763 | 46.0 | 5704 | 3.2113 | 0.5133 |
0.0419 | 47.0 | 5828 | 3.3182 | 0.5196 |
0.02 | 48.0 | 5952 | 3.2911 | 0.5242 |
0.0153 | 49.0 | 6076 | 3.2990 | 0.5241 |
0.01 | 50.0 | 6200 | 3.2567 | 0.5253 |
0.0034 | 51.0 | 6324 | 3.2880 | 0.5307 |
0.0016 | 52.0 | 6448 | 3.2883 | 0.5299 |
0.0037 | 53.0 | 6572 | 3.2832 | 0.5291 |
0.0024 | 54.0 | 6696 | 3.2893 | 0.5305 |
0.0029 | 55.0 | 6820 | 3.2853 | 0.5301 |
0.0013 | 56.0 | 6944 | 3.2936 | 0.5319 |
0.0042 | 57.0 | 7068 | 3.2950 | 0.5301 |
0.0028 | 58.0 | 7192 | 3.2941 | 0.5305 |
0.0027 | 59.0 | 7316 | 3.2915 | 0.5303 |
0.0024 | 60.0 | 7440 | 3.2996 | 0.5297 |
0.002 | 61.0 | 7564 | 3.3013 | 0.5313 |
0.0033 | 62.0 | 7688 | 3.3001 | 0.5312 |
0.0029 | 63.0 | 7812 | 3.3077 | 0.5291 |
0.0029 | 64.0 | 7936 | 3.3084 | 0.5324 |
0.002 | 65.0 | 8060 | 3.3016 | 0.5313 |
0.0032 | 66.0 | 8184 | 3.3133 | 0.5312 |
0.003 | 67.0 | 8308 | 3.3083 | 0.5312 |
0.0025 | 68.0 | 8432 | 3.3116 | 0.5300 |
0.0019 | 69.0 | 8556 | 3.3131 | 0.5311 |
0.0042 | 70.0 | 8680 | 3.3165 | 0.5321 |
0.0031 | 71.0 | 8804 | 3.3199 | 0.5306 |
0.0018 | 72.0 | 8928 | 3.3196 | 0.5307 |
0.0038 | 73.0 | 9052 | 3.3242 | 0.5293 |
0.0027 | 74.0 | 9176 | 3.3262 | 0.5314 |
0.0031 | 75.0 | 9300 | 3.3151 | 0.5307 |
0.0019 | 76.0 | 9424 | 3.3329 | 0.5299 |
0.0028 | 77.0 | 9548 | 3.3254 | 0.5300 |
0.0022 | 78.0 | 9672 | 3.3366 | 0.5287 |
0.0026 | 79.0 | 9796 | 3.3200 | 0.5317 |
0.003 | 80.0 | 9920 | 3.3351 | 0.5275 |
0.0016 | 81.0 | 10044 | 3.3367 | 0.5275 |
0.0027 | 82.0 | 10168 | 3.3283 | 0.5304 |
0.003 | 83.0 | 10292 | 3.3463 | 0.5325 |
0.0032 | 84.0 | 10416 | 3.3090 | 0.5330 |
0.0021 | 85.0 | 10540 | 3.3475 | 0.5305 |
0.0014 | 86.0 | 10664 | 3.3454 | 0.5282 |
0.003 | 87.0 | 10788 | 3.3513 | 0.5282 |
0.002 | 88.0 | 10912 | 3.3402 | 0.5326 |
0.0026 | 89.0 | 11036 | 3.3596 | 0.5320 |
0.0029 | 90.0 | 11160 | 3.3535 | 0.5329 |
0.0022 | 91.0 | 11284 | 3.3501 | 0.5290 |
0.0024 | 92.0 | 11408 | 3.3674 | 0.5298 |
0.0023 | 93.0 | 11532 | 3.3492 | 0.5319 |
0.0026 | 94.0 | 11656 | 3.3474 | 0.5346 |
0.0024 | 95.0 | 11780 | 3.3547 | 0.5286 |
0.0028 | 96.0 | 11904 | 3.3649 | 0.5305 |
0.0019 | 97.0 | 12028 | 3.3424 | 0.5307 |
0.003 | 98.0 | 12152 | 3.3582 | 0.5329 |
0.0029 | 99.0 | 12276 | 3.3664 | 0.5333 |
0.0025 | 100.0 | 12400 | 3.3732 | 0.5402 |
0.0021 | 101.0 | 12524 | 3.3765 | 0.5381 |
0.0026 | 102.0 | 12648 | 3.3801 | 0.5384 |
0.002 | 103.0 | 12772 | 3.3830 | 0.5351 |
0.0024 | 104.0 | 12896 | 3.3898 | 0.5360 |
0.0026 | 105.0 | 13020 | 3.3894 | 0.5383 |
0.0023 | 106.0 | 13144 | 3.3856 | 0.5353 |
0.0019 | 107.0 | 13268 | 3.3820 | 0.5340 |
0.0019 | 108.0 | 13392 | 3.3777 | 0.5365 |
0.0357 | 109.0 | 13516 | 3.2403 | 0.5198 |
0.0276 | 110.0 | 13640 | 3.3396 | 0.5119 |
0.0191 | 111.0 | 13764 | 3.3109 | 0.5063 |
0.0048 | 112.0 | 13888 | 3.2736 | 0.5128 |
0.0013 | 113.0 | 14012 | 3.2770 | 0.5133 |
0.0024 | 114.0 | 14136 | 3.2803 | 0.5139 |
0.0023 | 115.0 | 14260 | 3.2824 | 0.5141 |
0.0015 | 116.0 | 14384 | 3.2858 | 0.5144 |
0.0022 | 117.0 | 14508 | 3.2887 | 0.5158 |
0.0018 | 118.0 | 14632 | 3.2916 | 0.5184 |
0.0014 | 119.0 | 14756 | 3.2951 | 0.5168 |
0.0029 | 120.0 | 14880 | 3.2968 | 0.5177 |
0.002 | 121.0 | 15004 | 3.2975 | 0.5189 |
0.002 | 122.0 | 15128 | 3.3010 | 0.5183 |
0.0028 | 123.0 | 15252 | 3.3035 | 0.5193 |
0.0013 | 124.0 | 15376 | 3.3065 | 0.5203 |
0.0024 | 125.0 | 15500 | 3.3078 | 0.5200 |
0.0017 | 126.0 | 15624 | 3.3096 | 0.5212 |
0.0019 | 127.0 | 15748 | 3.3095 | 0.5206 |
0.0023 | 128.0 | 15872 | 3.3131 | 0.5234 |
0.002 | 129.0 | 15996 | 3.3153 | 0.5250 |
0.0022 | 130.0 | 16120 | 3.3188 | 0.5226 |
0.0018 | 131.0 | 16244 | 3.3204 | 0.5228 |
0.0024 | 132.0 | 16368 | 3.3209 | 0.5235 |
0.0021 | 133.0 | 16492 | 3.3222 | 0.5226 |
0.002 | 134.0 | 16616 | 3.3239 | 0.5248 |
0.0019 | 135.0 | 16740 | 3.3267 | 0.5244 |
0.0017 | 136.0 | 16864 | 3.3273 | 0.5233 |
0.0024 | 137.0 | 16988 | 3.3287 | 0.5252 |
0.0019 | 138.0 | 17112 | 3.3299 | 0.5267 |
0.0024 | 139.0 | 17236 | 3.3319 | 0.5255 |
0.0014 | 140.0 | 17360 | 3.3336 | 0.5239 |
0.002 | 141.0 | 17484 | 3.3339 | 0.5256 |
0.0021 | 142.0 | 17608 | 3.3358 | 0.5240 |
0.002 | 143.0 | 17732 | 3.3365 | 0.5259 |
0.0017 | 144.0 | 17856 | 3.3398 | 0.5259 |
0.0015 | 145.0 | 17980 | 3.3438 | 0.5248 |
0.0016 | 146.0 | 18104 | 3.3428 | 0.5241 |
0.002 | 147.0 | 18228 | 3.3448 | 0.5254 |
0.0013 | 148.0 | 18352 | 3.3469 | 0.5248 |
0.0027 | 149.0 | 18476 | 3.3495 | 0.5256 |
0.0016 | 150.0 | 18600 | 3.3509 | 0.5255 |
0.0017 | 151.0 | 18724 | 3.3539 | 0.5252 |
0.0023 | 152.0 | 18848 | 3.3545 | 0.5257 |
0.0024 | 153.0 | 18972 | 3.3580 | 0.5264 |
0.001 | 154.0 | 19096 | 3.3616 | 0.5251 |
0.0021 | 155.0 | 19220 | 3.3632 | 0.5251 |
0.0016 | 156.0 | 19344 | 3.3629 | 0.5250 |
0.0015 | 157.0 | 19468 | 3.3677 | 0.5240 |
0.0021 | 158.0 | 19592 | 3.3693 | 0.5236 |
0.0022 | 159.0 | 19716 | 3.3662 | 0.5245 |
0.0015 | 160.0 | 19840 | 3.3686 | 0.5228 |
0.0027 | 161.0 | 19964 | 3.3730 | 0.5235 |
0.0014 | 162.0 | 20088 | 3.3746 | 0.5239 |
0.0018 | 163.0 | 20212 | 3.3776 | 0.5253 |
0.0022 | 164.0 | 20336 | 3.3766 | 0.5264 |
0.0022 | 165.0 | 20460 | 3.3796 | 0.5258 |
0.0017 | 166.0 | 20584 | 3.3845 | 0.5250 |
0.0018 | 167.0 | 20708 | 3.3859 | 0.5252 |
0.0019 | 168.0 | 20832 | 3.3880 | 0.5250 |
0.0025 | 169.0 | 20956 | 3.3894 | 0.5258 |
0.0015 | 170.0 | 21080 | 3.3930 | 0.5264 |
0.0015 | 171.0 | 21204 | 3.3967 | 0.5271 |
0.0013 | 172.0 | 21328 | 3.3972 | 0.5267 |
0.0045 | 173.0 | 21452 | 3.3995 | 0.5272 |
0.0023 | 174.0 | 21576 | 3.4015 | 0.5271 |
0.0021 | 175.0 | 21700 | 3.4037 | 0.5287 |
0.0014 | 176.0 | 21824 | 3.4036 | 0.5271 |
0.0025 | 177.0 | 21948 | 3.4068 | 0.5273 |
0.0018 | 178.0 | 22072 | 3.4081 | 0.5283 |
0.0015 | 179.0 | 22196 | 3.4095 | 0.5286 |
0.0019 | 180.0 | 22320 | 3.4147 | 0.5281 |
0.0016 | 181.0 | 22444 | 3.4164 | 0.5283 |
0.0022 | 182.0 | 22568 | 3.4177 | 0.5287 |
0.0013 | 183.0 | 22692 | 3.4199 | 0.5280 |
0.0011 | 184.0 | 22816 | 3.4209 | 0.5303 |
0.0023 | 185.0 | 22940 | 3.4221 | 0.5295 |
0.002 | 186.0 | 23064 | 3.4245 | 0.5291 |
0.0011 | 187.0 | 23188 | 3.4253 | 0.5286 |
0.0019 | 188.0 | 23312 | 3.4259 | 0.5292 |
0.002 | 189.0 | 23436 | 3.4302 | 0.5305 |
0.0012 | 190.0 | 23560 | 3.4302 | 0.5304 |
0.0021 | 191.0 | 23684 | 3.4312 | 0.5297 |
0.0019 | 192.0 | 23808 | 3.4321 | 0.5306 |
0.0018 | 193.0 | 23932 | 3.4318 | 0.5306 |
0.0019 | 194.0 | 24056 | 3.4344 | 0.5284 |
0.0018 | 195.0 | 24180 | 3.4354 | 0.5304 |
0.0016 | 196.0 | 24304 | 3.4357 | 0.5309 |
0.0018 | 197.0 | 24428 | 3.4327 | 0.5320 |
0.002 | 198.0 | 24552 | 3.4365 | 0.5304 |
0.001 | 199.0 | 24676 | 3.4353 | 0.5304 |
0.0014 | 200.0 | 24800 | 3.4344 | 0.5304 |
Framework versions
- Transformers 4.48.0.dev0
- Pytorch 2.4.1+cu121
- Datasets 3.1.0
- Tokenizers 0.21.0
- Downloads last month
- 6
Inference Providers
NEW
This model is not currently available via any of the supported third-party Inference Providers, and
the model is not deployed on the HF Inference API.
Model tree for BenPhan/ST2_modernbert-large_product_V2
Base model
answerdotai/ModernBERT-large