ST2_modernbert-large_product_V2

This model is a fine-tuned version of answerdotai/ModernBERT-large on the None dataset. It achieves the following results on the evaluation set:

  • Loss: 3.4344
  • F1: 0.5304

Model description

More information needed

Intended uses & limitations

More information needed

Training and evaluation data

More information needed

Training procedure

Training hyperparameters

The following hyperparameters were used during training:

  • learning_rate: 5e-05
  • train_batch_size: 36
  • eval_batch_size: 16
  • seed: 42
  • optimizer: Use OptimizerNames.ADAMW_TORCH_FUSED with betas=(0.9,0.999) and epsilon=1e-08 and optimizer_args=No additional optimizer arguments
  • lr_scheduler_type: linear
  • num_epochs: 200

Training results

Training Loss Epoch Step Validation Loss F1
6.5291 1.0 124 6.1481 0.0198
5.7897 2.0 248 4.1589 0.2530
3.7526 3.0 372 3.1379 0.3966
1.9849 4.0 496 2.8741 0.4916
0.3139 5.0 620 2.8435 0.4837
0.2204 6.0 744 2.8593 0.5099
0.1745 7.0 868 3.0514 0.5088
0.1387 8.0 992 2.8805 0.5178
0.0734 9.0 1116 2.9076 0.5170
0.0646 10.0 1240 2.8863 0.5319
0.0289 11.0 1364 2.9510 0.5294
0.0443 12.0 1488 2.9005 0.5265
0.0251 13.0 1612 2.9952 0.5139
0.048 14.0 1736 2.9175 0.5338
0.008 15.0 1860 2.8908 0.5310
0.0137 16.0 1984 2.9082 0.5345
0.0089 17.0 2108 2.9541 0.5363
0.0137 18.0 2232 2.9846 0.5272
0.0136 19.0 2356 3.0461 0.5291
0.0119 20.0 2480 2.9575 0.5231
0.0054 21.0 2604 2.9088 0.5449
0.0071 22.0 2728 3.0477 0.5178
0.0244 23.0 2852 3.0782 0.5169
0.0276 24.0 2976 3.1306 0.5170
0.0665 25.0 3100 3.2074 0.5115
0.0457 26.0 3224 3.1918 0.5156
0.0269 27.0 3348 3.0512 0.5328
0.0118 28.0 3472 3.1252 0.5208
0.014 29.0 3596 3.1192 0.5366
0.0036 30.0 3720 3.1020 0.5379
0.0039 31.0 3844 3.0947 0.5419
0.0039 32.0 3968 3.1090 0.5405
0.0035 33.0 4092 3.1003 0.5402
0.0025 34.0 4216 3.1039 0.5432
0.0043 35.0 4340 3.1213 0.5421
0.002 36.0 4464 3.1008 0.5378
0.0029 37.0 4588 3.1063 0.5424
0.0024 38.0 4712 3.1267 0.5420
0.0038 39.0 4836 3.1236 0.5419
0.0027 40.0 4960 3.1090 0.5403
0.003 41.0 5084 3.1140 0.5405
0.003 42.0 5208 3.1305 0.5408
0.0067 43.0 5332 3.0086 0.5453
0.0083 44.0 5456 3.1991 0.5158
0.0394 45.0 5580 3.2579 0.5074
0.0763 46.0 5704 3.2113 0.5133
0.0419 47.0 5828 3.3182 0.5196
0.02 48.0 5952 3.2911 0.5242
0.0153 49.0 6076 3.2990 0.5241
0.01 50.0 6200 3.2567 0.5253
0.0034 51.0 6324 3.2880 0.5307
0.0016 52.0 6448 3.2883 0.5299
0.0037 53.0 6572 3.2832 0.5291
0.0024 54.0 6696 3.2893 0.5305
0.0029 55.0 6820 3.2853 0.5301
0.0013 56.0 6944 3.2936 0.5319
0.0042 57.0 7068 3.2950 0.5301
0.0028 58.0 7192 3.2941 0.5305
0.0027 59.0 7316 3.2915 0.5303
0.0024 60.0 7440 3.2996 0.5297
0.002 61.0 7564 3.3013 0.5313
0.0033 62.0 7688 3.3001 0.5312
0.0029 63.0 7812 3.3077 0.5291
0.0029 64.0 7936 3.3084 0.5324
0.002 65.0 8060 3.3016 0.5313
0.0032 66.0 8184 3.3133 0.5312
0.003 67.0 8308 3.3083 0.5312
0.0025 68.0 8432 3.3116 0.5300
0.0019 69.0 8556 3.3131 0.5311
0.0042 70.0 8680 3.3165 0.5321
0.0031 71.0 8804 3.3199 0.5306
0.0018 72.0 8928 3.3196 0.5307
0.0038 73.0 9052 3.3242 0.5293
0.0027 74.0 9176 3.3262 0.5314
0.0031 75.0 9300 3.3151 0.5307
0.0019 76.0 9424 3.3329 0.5299
0.0028 77.0 9548 3.3254 0.5300
0.0022 78.0 9672 3.3366 0.5287
0.0026 79.0 9796 3.3200 0.5317
0.003 80.0 9920 3.3351 0.5275
0.0016 81.0 10044 3.3367 0.5275
0.0027 82.0 10168 3.3283 0.5304
0.003 83.0 10292 3.3463 0.5325
0.0032 84.0 10416 3.3090 0.5330
0.0021 85.0 10540 3.3475 0.5305
0.0014 86.0 10664 3.3454 0.5282
0.003 87.0 10788 3.3513 0.5282
0.002 88.0 10912 3.3402 0.5326
0.0026 89.0 11036 3.3596 0.5320
0.0029 90.0 11160 3.3535 0.5329
0.0022 91.0 11284 3.3501 0.5290
0.0024 92.0 11408 3.3674 0.5298
0.0023 93.0 11532 3.3492 0.5319
0.0026 94.0 11656 3.3474 0.5346
0.0024 95.0 11780 3.3547 0.5286
0.0028 96.0 11904 3.3649 0.5305
0.0019 97.0 12028 3.3424 0.5307
0.003 98.0 12152 3.3582 0.5329
0.0029 99.0 12276 3.3664 0.5333
0.0025 100.0 12400 3.3732 0.5402
0.0021 101.0 12524 3.3765 0.5381
0.0026 102.0 12648 3.3801 0.5384
0.002 103.0 12772 3.3830 0.5351
0.0024 104.0 12896 3.3898 0.5360
0.0026 105.0 13020 3.3894 0.5383
0.0023 106.0 13144 3.3856 0.5353
0.0019 107.0 13268 3.3820 0.5340
0.0019 108.0 13392 3.3777 0.5365
0.0357 109.0 13516 3.2403 0.5198
0.0276 110.0 13640 3.3396 0.5119
0.0191 111.0 13764 3.3109 0.5063
0.0048 112.0 13888 3.2736 0.5128
0.0013 113.0 14012 3.2770 0.5133
0.0024 114.0 14136 3.2803 0.5139
0.0023 115.0 14260 3.2824 0.5141
0.0015 116.0 14384 3.2858 0.5144
0.0022 117.0 14508 3.2887 0.5158
0.0018 118.0 14632 3.2916 0.5184
0.0014 119.0 14756 3.2951 0.5168
0.0029 120.0 14880 3.2968 0.5177
0.002 121.0 15004 3.2975 0.5189
0.002 122.0 15128 3.3010 0.5183
0.0028 123.0 15252 3.3035 0.5193
0.0013 124.0 15376 3.3065 0.5203
0.0024 125.0 15500 3.3078 0.5200
0.0017 126.0 15624 3.3096 0.5212
0.0019 127.0 15748 3.3095 0.5206
0.0023 128.0 15872 3.3131 0.5234
0.002 129.0 15996 3.3153 0.5250
0.0022 130.0 16120 3.3188 0.5226
0.0018 131.0 16244 3.3204 0.5228
0.0024 132.0 16368 3.3209 0.5235
0.0021 133.0 16492 3.3222 0.5226
0.002 134.0 16616 3.3239 0.5248
0.0019 135.0 16740 3.3267 0.5244
0.0017 136.0 16864 3.3273 0.5233
0.0024 137.0 16988 3.3287 0.5252
0.0019 138.0 17112 3.3299 0.5267
0.0024 139.0 17236 3.3319 0.5255
0.0014 140.0 17360 3.3336 0.5239
0.002 141.0 17484 3.3339 0.5256
0.0021 142.0 17608 3.3358 0.5240
0.002 143.0 17732 3.3365 0.5259
0.0017 144.0 17856 3.3398 0.5259
0.0015 145.0 17980 3.3438 0.5248
0.0016 146.0 18104 3.3428 0.5241
0.002 147.0 18228 3.3448 0.5254
0.0013 148.0 18352 3.3469 0.5248
0.0027 149.0 18476 3.3495 0.5256
0.0016 150.0 18600 3.3509 0.5255
0.0017 151.0 18724 3.3539 0.5252
0.0023 152.0 18848 3.3545 0.5257
0.0024 153.0 18972 3.3580 0.5264
0.001 154.0 19096 3.3616 0.5251
0.0021 155.0 19220 3.3632 0.5251
0.0016 156.0 19344 3.3629 0.5250
0.0015 157.0 19468 3.3677 0.5240
0.0021 158.0 19592 3.3693 0.5236
0.0022 159.0 19716 3.3662 0.5245
0.0015 160.0 19840 3.3686 0.5228
0.0027 161.0 19964 3.3730 0.5235
0.0014 162.0 20088 3.3746 0.5239
0.0018 163.0 20212 3.3776 0.5253
0.0022 164.0 20336 3.3766 0.5264
0.0022 165.0 20460 3.3796 0.5258
0.0017 166.0 20584 3.3845 0.5250
0.0018 167.0 20708 3.3859 0.5252
0.0019 168.0 20832 3.3880 0.5250
0.0025 169.0 20956 3.3894 0.5258
0.0015 170.0 21080 3.3930 0.5264
0.0015 171.0 21204 3.3967 0.5271
0.0013 172.0 21328 3.3972 0.5267
0.0045 173.0 21452 3.3995 0.5272
0.0023 174.0 21576 3.4015 0.5271
0.0021 175.0 21700 3.4037 0.5287
0.0014 176.0 21824 3.4036 0.5271
0.0025 177.0 21948 3.4068 0.5273
0.0018 178.0 22072 3.4081 0.5283
0.0015 179.0 22196 3.4095 0.5286
0.0019 180.0 22320 3.4147 0.5281
0.0016 181.0 22444 3.4164 0.5283
0.0022 182.0 22568 3.4177 0.5287
0.0013 183.0 22692 3.4199 0.5280
0.0011 184.0 22816 3.4209 0.5303
0.0023 185.0 22940 3.4221 0.5295
0.002 186.0 23064 3.4245 0.5291
0.0011 187.0 23188 3.4253 0.5286
0.0019 188.0 23312 3.4259 0.5292
0.002 189.0 23436 3.4302 0.5305
0.0012 190.0 23560 3.4302 0.5304
0.0021 191.0 23684 3.4312 0.5297
0.0019 192.0 23808 3.4321 0.5306
0.0018 193.0 23932 3.4318 0.5306
0.0019 194.0 24056 3.4344 0.5284
0.0018 195.0 24180 3.4354 0.5304
0.0016 196.0 24304 3.4357 0.5309
0.0018 197.0 24428 3.4327 0.5320
0.002 198.0 24552 3.4365 0.5304
0.001 199.0 24676 3.4353 0.5304
0.0014 200.0 24800 3.4344 0.5304

Framework versions

  • Transformers 4.48.0.dev0
  • Pytorch 2.4.1+cu121
  • Datasets 3.1.0
  • Tokenizers 0.21.0
Downloads last month
6
Safetensors
Model size
397M params
Tensor type
F32
·
Inference Providers NEW
This model is not currently available via any of the supported third-party Inference Providers, and the model is not deployed on the HF Inference API.

Model tree for BenPhan/ST2_modernbert-large_product_V2

Finetuned
(48)
this model