modernbert-llm-router

This model is a fine-tuned version of answerdotai/ModernBERT-large on the None dataset. It achieves the following results on the evaluation set:

  • Loss: 3.7888
  • F1: 0.7785
  • Macro F1: 0.7785
  • Precision: 0.7832
  • Cross Entropy: 0.5213
  • Min Class Accuracy: 0.658
  • Confusion Matrix: [[938, 56, 6], [232, 658, 110], [20, 232, 748]]
  • Accuracy Class 0: 0.938
  • Accuracy Class 1: 0.658
  • Accuracy Class 2: 0.748

Model description

More information needed

Intended uses & limitations

More information needed

Training and evaluation data

More information needed

Training procedure

Training hyperparameters

The following hyperparameters were used during training:

  • learning_rate: 2e-05
  • train_batch_size: 4
  • eval_batch_size: 4
  • seed: 42
  • gradient_accumulation_steps: 16
  • total_train_batch_size: 64
  • optimizer: Use adamw_torch with betas=(0.9,0.999) and epsilon=1e-08 and optimizer_args=No additional optimizer arguments
  • lr_scheduler_type: linear
  • num_epochs: 3

Training results

Training Loss Epoch Step Validation Loss F1 Macro F1 Precision Cross Entropy Min Class Accuracy Confusion Matrix Accuracy Class 0 Accuracy Class 1 Accuracy Class 2
67.1584 0.0939 4000 4.3133 0.6905 0.6905 0.7174 0.6445 0.49 [[964, 32, 4], [427, 490, 83], [80, 274, 646]] 0.964 0.49 0.646
66.8186 0.0986 4200 4.2763 0.7056 0.7056 0.7332 0.6243 0.57 [[951, 45, 4], [370, 570, 60], [62, 325, 613]] 0.951 0.57 0.613
66.666 0.1033 4400 4.3091 0.6693 0.6693 0.7083 0.6652 0.534 [[959, 39, 2], [400, 542, 58], [71, 395, 534]] 0.959 0.542 0.534
66.5493 0.1080 4600 4.2950 0.6930 0.6930 0.7279 0.6567 0.531 [[937, 58, 5], [314, 628, 58], [29, 440, 531]] 0.937 0.628 0.531
66.4201 0.1127 4800 4.2264 0.7317 0.7317 0.7578 0.5987 0.583 [[935, 62, 3], [250, 687, 63], [23, 394, 583]] 0.935 0.687 0.583
66.2139 0.1174 5000 4.3305 0.6664 0.6664 0.6993 0.6979 0.501 [[969, 28, 3], [423, 501, 76], [73, 366, 561]] 0.969 0.501 0.561
65.8371 0.1221 5200 4.2472 0.6927 0.6927 0.7244 0.6322 0.562 [[968, 30, 2], [375, 562, 63], [60, 368, 572]] 0.968 0.562 0.572
65.7941 0.1268 5400 4.2528 0.7054 0.7054 0.7398 0.6475 0.553 [[958, 40, 2], [323, 625, 52], [53, 394, 553]] 0.958 0.625 0.553
65.5811 0.1314 5600 4.2208 0.7232 0.7232 0.7341 0.6087 0.57 [[941, 52, 7], [320, 570, 110], [27, 298, 675]] 0.941 0.57 0.675
65.4718 0.1361 5800 4.1907 0.7218 0.7218 0.7446 0.5945 0.589 [[948, 48, 4], [282, 643, 75], [36, 375, 589]] 0.948 0.643 0.589
65.2497 0.1408 6000 4.2049 0.7034 0.7034 0.7274 0.6252 0.591 [[947, 47, 6], [330, 591, 79], [69, 340, 591]] 0.947 0.591 0.591
65.1997 0.1455 6200 4.2574 0.6679 0.6679 0.7118 0.6690 0.526 [[968, 29, 3], [431, 526, 43], [69, 393, 538]] 0.968 0.526 0.538
65.0097 0.1502 6400 4.1857 0.7235 0.7235 0.7612 0.6106 0.552 [[950, 50, 0], [277, 685, 38], [59, 389, 552]] 0.95 0.685 0.552
64.8219 0.1549 6600 4.2046 0.7092 0.7092 0.7356 0.6392 0.584 [[958, 38, 4], [329, 604, 67], [40, 376, 584]] 0.958 0.604 0.584
64.7263 0.1596 6800 4.2920 0.6220 0.6220 0.7073 0.7250 0.355 [[955, 45, 0], [360, 615, 25], [67, 578, 355]] 0.955 0.615 0.355
64.6495 0.1643 7000 4.1895 0.6843 0.6843 0.7222 0.6356 0.537 [[957, 42, 1], [364, 582, 54], [57, 406, 537]] 0.957 0.582 0.537
64.3158 0.1690 7200 4.2260 0.6940 0.6940 0.7358 0.6636 0.521 [[959, 38, 3], [331, 626, 43], [65, 414, 521]] 0.959 0.626 0.521
64.3062 0.1737 7400 4.1442 0.7167 0.7167 0.7442 0.6048 0.599 [[954, 41, 5], [331, 613, 56], [56, 345, 599]] 0.954 0.613 0.599
64.0348 0.1784 7600 4.1174 0.7345 0.7345 0.7454 0.5703 0.583 [[955, 36, 9], [304, 583, 113], [46, 270, 684]] 0.955 0.583 0.684
64.0196 0.1831 7800 4.1144 0.7247 0.7247 0.7462 0.5859 0.613 [[952, 46, 2], [301, 624, 75], [50, 337, 613]] 0.952 0.624 0.613
63.8908 0.1878 8000 4.1876 0.6717 0.6717 0.7164 0.6808 0.5 [[949, 49, 2], [361, 591, 48], [46, 454, 500]] 0.949 0.591 0.5
63.7917 0.1925 8200 4.1457 0.6920 0.6920 0.7347 0.6319 0.506 [[938, 59, 3], [302, 652, 46], [40, 454, 506]] 0.938 0.652 0.506
63.6911 0.1972 8400 4.1293 0.7201 0.7201 0.7390 0.6186 0.619 [[935, 61, 4], [297, 619, 84], [49, 332, 619]] 0.935 0.619 0.619
63.4114 0.2019 8600 4.1279 0.6899 0.6899 0.7372 0.6263 0.501 [[959, 41, 0], [327, 635, 38], [52, 447, 501]] 0.959 0.635 0.501
63.312 0.2066 8800 4.1076 0.7273 0.7273 0.7630 0.6074 0.559 [[944, 56, 0], [267, 692, 41], [34, 407, 559]] 0.944 0.692 0.559
63.2253 0.2113 9000 4.0827 0.7207 0.7207 0.7399 0.5908 0.587 [[955, 42, 3], [331, 587, 82], [45, 318, 637]] 0.955 0.587 0.637
63.1791 0.2159 9200 4.1021 0.7144 0.7144 0.7459 0.6179 0.582 [[957, 42, 1], [328, 621, 51], [58, 360, 582]] 0.957 0.621 0.582
62.9892 0.2206 9400 4.0988 0.6994 0.6994 0.7354 0.6213 0.568 [[965, 33, 2], [368, 586, 46], [59, 373, 568]] 0.965 0.586 0.568
62.9596 0.2253 9600 4.0299 0.7349 0.7349 0.7547 0.5633 0.622 [[953, 46, 1], [309, 622, 69], [33, 325, 642]] 0.953 0.622 0.642
62.7207 0.2300 9800 4.1452 0.6911 0.6911 0.7252 0.6585 0.543 [[966, 32, 2], [404, 543, 53], [64, 349, 587]] 0.966 0.543 0.587
62.6702 0.2347 10000 4.0368 0.7287 0.7287 0.7537 0.5715 0.602 [[947, 50, 3], [292, 649, 59], [32, 366, 602]] 0.947 0.649 0.602
62.5446 0.2394 10200 4.0780 0.7032 0.7032 0.7319 0.6159 0.576 [[962, 37, 1], [362, 576, 62], [44, 365, 591]] 0.962 0.576 0.591
62.3668 0.2441 10400 4.1327 0.6841 0.6841 0.7314 0.6766 0.491 [[959, 38, 3], [331, 629, 40], [39, 470, 491]] 0.959 0.629 0.491
62.2441 0.2488 10600 4.0393 0.7446 0.7446 0.7581 0.5807 0.645 [[951, 43, 6], [265, 645, 90], [30, 320, 650]] 0.951 0.645 0.65
62.2008 0.2535 10800 3.9862 0.7569 0.7569 0.7644 0.5375 0.612 [[952, 44, 4], [273, 612, 115], [30, 249, 721]] 0.952 0.612 0.721
62.1253 0.2582 11000 4.0344 0.7083 0.7083 0.7336 0.6014 0.558 [[964, 32, 4], [376, 558, 66], [49, 328, 623]] 0.964 0.558 0.623
61.9099 0.2629 11200 4.0261 0.7336 0.7336 0.7616 0.5951 0.583 [[937, 58, 5], [256, 692, 52], [40, 377, 583]] 0.937 0.692 0.583
61.7465 0.2676 11400 4.1156 0.6506 0.6506 0.7096 0.6897 0.457 [[967, 33, 0], [403, 563, 34], [41, 502, 457]] 0.967 0.563 0.457
61.7948 0.2723 11600 4.0768 0.6828 0.6828 0.7276 0.6630 0.526 [[963, 36, 1], [380, 583, 37], [49, 425, 526]] 0.963 0.583 0.526
61.583 0.2770 11800 4.0815 0.6942 0.6942 0.7298 0.6597 0.57 [[962, 37, 1], [379, 570, 51], [71, 357, 572]] 0.962 0.57 0.572
61.4462 0.2817 12000 4.0190 0.7266 0.7266 0.7531 0.6087 0.596 [[952, 47, 1], [297, 645, 58], [36, 368, 596]] 0.952 0.645 0.596
61.1997 0.2864 12200 3.9755 0.7559 0.7559 0.7765 0.5705 0.623 [[949, 48, 3], [234, 704, 62], [15, 362, 623]] 0.949 0.704 0.623
61.2348 0.2911 12400 3.9294 0.7562 0.7562 0.7719 0.5367 0.657 [[955, 43, 2], [271, 657, 72], [34, 299, 667]] 0.955 0.657 0.667
61.0238 0.2958 12600 4.0619 0.6856 0.6856 0.7384 0.6624 0.473 [[951, 49, 0], [305, 658, 37], [17, 510, 473]] 0.951 0.658 0.473
61.0242 0.3005 12800 3.9927 0.6979 0.6979 0.7304 0.6148 0.571 [[967, 31, 2], [368, 577, 55], [40, 389, 571]] 0.967 0.577 0.571
60.8857 0.3051 13000 3.9335 0.7601 0.7601 0.7726 0.5507 0.668 [[937, 59, 4], [234, 682, 84], [22, 310, 668]] 0.937 0.682 0.668
60.7716 0.3098 13200 4.0305 0.7096 0.7096 0.7453 0.6505 0.565 [[960, 39, 1], [337, 621, 42], [39, 396, 565]] 0.96 0.621 0.565
60.6762 0.3145 13400 3.9114 0.7727 0.7727 0.7840 0.5417 0.686 [[921, 77, 2], [203, 713, 84], [15, 299, 686]] 0.921 0.713 0.686
60.5466 0.3192 13600 3.9843 0.7247 0.7247 0.7476 0.6131 0.601 [[958, 41, 1], [296, 631, 73], [42, 357, 601]] 0.958 0.631 0.601
60.5036 0.3239 13800 3.9504 0.7298 0.7298 0.7534 0.5888 0.593 [[948, 49, 3], [271, 661, 68], [28, 379, 593]] 0.948 0.661 0.593
60.3629 0.3286 14000 3.9785 0.7168 0.7168 0.7382 0.6287 0.544 [[965, 33, 2], [381, 544, 75], [44, 294, 662]] 0.965 0.544 0.662
60.259 0.3333 14200 3.8927 0.7358 0.7358 0.7532 0.5535 0.631 [[943, 55, 2], [292, 631, 77], [30, 326, 644]] 0.943 0.631 0.644
60.0233 0.3380 14400 3.9874 0.7001 0.7001 0.7249 0.6405 0.536 [[968, 28, 4], [394, 536, 70], [32, 349, 619]] 0.968 0.536 0.619
59.9092 0.3427 14600 3.9597 0.7173 0.7173 0.7462 0.6215 0.571 [[943, 53, 4], [291, 651, 58], [23, 406, 571]] 0.943 0.651 0.571
59.827 0.3474 14800 3.9515 0.7201 0.7201 0.7480 0.6193 0.589 [[952, 46, 2], [309, 634, 57], [44, 367, 589]] 0.952 0.634 0.589
59.7204 0.3521 15000 3.8741 0.7626 0.7626 0.7738 0.5534 0.666 [[929, 65, 6], [210, 698, 92], [20, 314, 666]] 0.929 0.698 0.666
59.667 0.3568 15200 3.9475 0.7208 0.7208 0.7517 0.6281 0.553 [[940, 57, 3], [257, 683, 60], [23, 424, 553]] 0.94 0.683 0.553
59.542 0.3615 15400 3.9530 0.6913 0.6913 0.7331 0.6452 0.532 [[961, 39, 0], [357, 602, 41], [32, 436, 532]] 0.961 0.602 0.532
59.4635 0.3662 15600 3.9513 0.7037 0.7037 0.7344 0.6504 0.582 [[965, 32, 3], [360, 582, 58], [61, 354, 585]] 0.965 0.582 0.585
59.2113 0.3709 15800 3.8991 0.7461 0.7461 0.7663 0.5940 0.632 [[947, 51, 2], [269, 668, 63], [21, 347, 632]] 0.947 0.668 0.632
59.2237 0.3756 16000 3.9010 0.7422 0.7422 0.7663 0.5978 0.611 [[943, 55, 2], [263, 681, 56], [17, 372, 611]] 0.943 0.681 0.611
59.0031 0.3803 16200 3.9380 0.7139 0.7139 0.7373 0.6298 0.573 [[961, 37, 2], [357, 573, 70], [43, 331, 626]] 0.961 0.573 0.626
59.0405 0.3850 16400 3.8858 0.7261 0.7261 0.7552 0.6002 0.583 [[950, 48, 2], [290, 658, 52], [25, 392, 583]] 0.95 0.658 0.583
58.8338 0.3896 16600 3.8715 0.7270 0.7270 0.7510 0.5955 0.595 [[946, 51, 3], [280, 653, 67], [38, 367, 595]] 0.946 0.653 0.595
58.8746 0.3943 16800 3.9528 0.7002 0.7002 0.7382 0.6745 0.553 [[956, 42, 2], [349, 610, 41], [45, 402, 553]] 0.956 0.61 0.553
58.6048 0.3990 17000 3.8640 0.7368 0.7368 0.7687 0.5965 0.575 [[938, 61, 1], [248, 707, 45], [23, 402, 575]] 0.938 0.707 0.575
58.4862 0.4037 17200 3.9524 0.6945 0.6945 0.7404 0.6891 0.511 [[952, 46, 2], [323, 642, 35], [40, 449, 511]] 0.952 0.642 0.511
58.4176 0.4084 17400 3.7888 0.7785 0.7785 0.7832 0.5213 0.658 [[938, 56, 6], [232, 658, 110], [20, 232, 748]] 0.938 0.658 0.748
58.3256 0.4131 17600 3.8355 0.7427 0.7427 0.7628 0.5795 0.626 [[946, 52, 2], [266, 666, 68], [28, 346, 626]] 0.946 0.666 0.626
58.193 0.4178 17800 3.9014 0.7114 0.7114 0.7458 0.6464 0.551 [[951, 48, 1], [300, 649, 51], [37, 412, 551]] 0.951 0.649 0.551
58.0348 0.4225 18000 3.9505 0.6624 0.6624 0.7268 0.7058 0.441 [[961, 39, 0], [354, 620, 26], [35, 524, 441]] 0.961 0.62 0.441
57.9114 0.4272 18200 3.9220 0.7076 0.7076 0.7400 0.6741 0.577 [[964, 35, 1], [350, 600, 50], [37, 386, 577]] 0.964 0.6 0.577
57.9024 0.4319 18400 3.8897 0.6970 0.6970 0.7320 0.6580 0.551 [[964, 34, 2], [351, 597, 52], [35, 414, 551]] 0.964 0.597 0.551
57.819 0.4366 18600 3.8781 0.7186 0.7186 0.7462 0.6470 0.587 [[959, 39, 2], [315, 626, 59], [35, 378, 587]] 0.959 0.626 0.587
57.7407 0.4413 18800 3.9488 0.6797 0.6797 0.7243 0.7260 0.498 [[960, 38, 2], [345, 608, 47], [44, 458, 498]] 0.96 0.608 0.498
57.5802 0.4460 19000 3.8100 0.7301 0.7301 0.7501 0.6038 0.619 [[952, 46, 2], [293, 632, 75], [22, 359, 619]] 0.952 0.632 0.619
57.4922 0.4507 19200 3.8570 0.7275 0.7275 0.7461 0.6478 0.616 [[953, 46, 1], [301, 616, 83], [39, 333, 628]] 0.953 0.616 0.628
57.3413 0.4554 19400 3.8799 0.6979 0.6979 0.7352 0.6764 0.539 [[951, 47, 2], [328, 623, 49], [44, 417, 539]] 0.951 0.623 0.539
57.2835 0.4601 19600 3.7841 0.7574 0.7574 0.7780 0.5797 0.618 [[928, 69, 3], [200, 732, 68], [24, 358, 618]] 0.928 0.732 0.618
57.0308 0.4648 19800 3.8292 0.7367 0.7367 0.7564 0.6354 0.617 [[942, 54, 4], [266, 661, 73], [21, 362, 617]] 0.942 0.661 0.617
56.9914 0.4695 20000 3.8292 0.7219 0.7219 0.7428 0.6366 0.601 [[951, 46, 3], [324, 601, 75], [52, 319, 629]] 0.951 0.601 0.629
56.868 0.4741 20200 3.8733 0.7062 0.7062 0.7367 0.6846 0.553 [[946, 53, 1], [299, 636, 65], [34, 413, 553]] 0.946 0.636 0.553
56.6672 0.4788 20400 3.8836 0.7048 0.7048 0.7404 0.6927 0.548 [[955, 44, 1], [318, 631, 51], [58, 394, 548]] 0.955 0.631 0.548
56.7036 0.4835 20600 3.8409 0.7219 0.7219 0.7546 0.6626 0.562 [[952, 47, 1], [282, 668, 50], [49, 389, 562]] 0.952 0.668 0.562
56.5975 0.4882 20800 3.8347 0.7205 0.7205 0.7446 0.6557 0.565 [[968, 29, 3], [368, 565, 67], [61, 291, 648]] 0.968 0.565 0.648
56.4269 0.4929 21000 3.7917 0.7427 0.7427 0.7637 0.6239 0.63 [[954, 44, 2], [283, 655, 62], [26, 344, 630]] 0.954 0.655 0.63
56.4064 0.4976 21200 3.8596 0.6763 0.6763 0.7197 0.7084 0.531 [[979, 20, 1], [426, 531, 43], [73, 379, 548]] 0.979 0.531 0.548
56.3241 0.5023 21400 3.7867 0.7141 0.7141 0.7496 0.6337 0.545 [[952, 48, 0], [288, 662, 50], [31, 424, 545]] 0.952 0.662 0.545

Framework versions

  • Transformers 4.51.3
  • Pytorch 2.6.0+cu126
  • Datasets 3.5.1
  • Tokenizers 0.21.1
Downloads last month
24
Safetensors
Model size
396M params
Tensor type
F32
ยท
Inference Providers NEW
This model isn't deployed by any Inference Provider. ๐Ÿ™‹ Ask for provider support

Model tree for BioPhy/match_repo_o3_prod

Finetuned
(116)
this model