modernbert-llm-router
This model is a fine-tuned version of answerdotai/ModernBERT-large on the None dataset. It achieves the following results on the evaluation set:
- Loss: 3.7888
- F1: 0.7785
- Macro F1: 0.7785
- Precision: 0.7832
- Cross Entropy: 0.5213
- Min Class Accuracy: 0.658
- Confusion Matrix: [[938, 56, 6], [232, 658, 110], [20, 232, 748]]
- Accuracy Class 0: 0.938
- Accuracy Class 1: 0.658
- Accuracy Class 2: 0.748
Model description
More information needed
Intended uses & limitations
More information needed
Training and evaluation data
More information needed
Training procedure
Training hyperparameters
The following hyperparameters were used during training:
- learning_rate: 2e-05
- train_batch_size: 4
- eval_batch_size: 4
- seed: 42
- gradient_accumulation_steps: 16
- total_train_batch_size: 64
- optimizer: Use adamw_torch with betas=(0.9,0.999) and epsilon=1e-08 and optimizer_args=No additional optimizer arguments
- lr_scheduler_type: linear
- num_epochs: 3
Training results
Training Loss | Epoch | Step | Validation Loss | F1 | Macro F1 | Precision | Cross Entropy | Min Class Accuracy | Confusion Matrix | Accuracy Class 0 | Accuracy Class 1 | Accuracy Class 2 |
---|---|---|---|---|---|---|---|---|---|---|---|---|
67.1584 | 0.0939 | 4000 | 4.3133 | 0.6905 | 0.6905 | 0.7174 | 0.6445 | 0.49 | [[964, 32, 4], [427, 490, 83], [80, 274, 646]] | 0.964 | 0.49 | 0.646 |
66.8186 | 0.0986 | 4200 | 4.2763 | 0.7056 | 0.7056 | 0.7332 | 0.6243 | 0.57 | [[951, 45, 4], [370, 570, 60], [62, 325, 613]] | 0.951 | 0.57 | 0.613 |
66.666 | 0.1033 | 4400 | 4.3091 | 0.6693 | 0.6693 | 0.7083 | 0.6652 | 0.534 | [[959, 39, 2], [400, 542, 58], [71, 395, 534]] | 0.959 | 0.542 | 0.534 |
66.5493 | 0.1080 | 4600 | 4.2950 | 0.6930 | 0.6930 | 0.7279 | 0.6567 | 0.531 | [[937, 58, 5], [314, 628, 58], [29, 440, 531]] | 0.937 | 0.628 | 0.531 |
66.4201 | 0.1127 | 4800 | 4.2264 | 0.7317 | 0.7317 | 0.7578 | 0.5987 | 0.583 | [[935, 62, 3], [250, 687, 63], [23, 394, 583]] | 0.935 | 0.687 | 0.583 |
66.2139 | 0.1174 | 5000 | 4.3305 | 0.6664 | 0.6664 | 0.6993 | 0.6979 | 0.501 | [[969, 28, 3], [423, 501, 76], [73, 366, 561]] | 0.969 | 0.501 | 0.561 |
65.8371 | 0.1221 | 5200 | 4.2472 | 0.6927 | 0.6927 | 0.7244 | 0.6322 | 0.562 | [[968, 30, 2], [375, 562, 63], [60, 368, 572]] | 0.968 | 0.562 | 0.572 |
65.7941 | 0.1268 | 5400 | 4.2528 | 0.7054 | 0.7054 | 0.7398 | 0.6475 | 0.553 | [[958, 40, 2], [323, 625, 52], [53, 394, 553]] | 0.958 | 0.625 | 0.553 |
65.5811 | 0.1314 | 5600 | 4.2208 | 0.7232 | 0.7232 | 0.7341 | 0.6087 | 0.57 | [[941, 52, 7], [320, 570, 110], [27, 298, 675]] | 0.941 | 0.57 | 0.675 |
65.4718 | 0.1361 | 5800 | 4.1907 | 0.7218 | 0.7218 | 0.7446 | 0.5945 | 0.589 | [[948, 48, 4], [282, 643, 75], [36, 375, 589]] | 0.948 | 0.643 | 0.589 |
65.2497 | 0.1408 | 6000 | 4.2049 | 0.7034 | 0.7034 | 0.7274 | 0.6252 | 0.591 | [[947, 47, 6], [330, 591, 79], [69, 340, 591]] | 0.947 | 0.591 | 0.591 |
65.1997 | 0.1455 | 6200 | 4.2574 | 0.6679 | 0.6679 | 0.7118 | 0.6690 | 0.526 | [[968, 29, 3], [431, 526, 43], [69, 393, 538]] | 0.968 | 0.526 | 0.538 |
65.0097 | 0.1502 | 6400 | 4.1857 | 0.7235 | 0.7235 | 0.7612 | 0.6106 | 0.552 | [[950, 50, 0], [277, 685, 38], [59, 389, 552]] | 0.95 | 0.685 | 0.552 |
64.8219 | 0.1549 | 6600 | 4.2046 | 0.7092 | 0.7092 | 0.7356 | 0.6392 | 0.584 | [[958, 38, 4], [329, 604, 67], [40, 376, 584]] | 0.958 | 0.604 | 0.584 |
64.7263 | 0.1596 | 6800 | 4.2920 | 0.6220 | 0.6220 | 0.7073 | 0.7250 | 0.355 | [[955, 45, 0], [360, 615, 25], [67, 578, 355]] | 0.955 | 0.615 | 0.355 |
64.6495 | 0.1643 | 7000 | 4.1895 | 0.6843 | 0.6843 | 0.7222 | 0.6356 | 0.537 | [[957, 42, 1], [364, 582, 54], [57, 406, 537]] | 0.957 | 0.582 | 0.537 |
64.3158 | 0.1690 | 7200 | 4.2260 | 0.6940 | 0.6940 | 0.7358 | 0.6636 | 0.521 | [[959, 38, 3], [331, 626, 43], [65, 414, 521]] | 0.959 | 0.626 | 0.521 |
64.3062 | 0.1737 | 7400 | 4.1442 | 0.7167 | 0.7167 | 0.7442 | 0.6048 | 0.599 | [[954, 41, 5], [331, 613, 56], [56, 345, 599]] | 0.954 | 0.613 | 0.599 |
64.0348 | 0.1784 | 7600 | 4.1174 | 0.7345 | 0.7345 | 0.7454 | 0.5703 | 0.583 | [[955, 36, 9], [304, 583, 113], [46, 270, 684]] | 0.955 | 0.583 | 0.684 |
64.0196 | 0.1831 | 7800 | 4.1144 | 0.7247 | 0.7247 | 0.7462 | 0.5859 | 0.613 | [[952, 46, 2], [301, 624, 75], [50, 337, 613]] | 0.952 | 0.624 | 0.613 |
63.8908 | 0.1878 | 8000 | 4.1876 | 0.6717 | 0.6717 | 0.7164 | 0.6808 | 0.5 | [[949, 49, 2], [361, 591, 48], [46, 454, 500]] | 0.949 | 0.591 | 0.5 |
63.7917 | 0.1925 | 8200 | 4.1457 | 0.6920 | 0.6920 | 0.7347 | 0.6319 | 0.506 | [[938, 59, 3], [302, 652, 46], [40, 454, 506]] | 0.938 | 0.652 | 0.506 |
63.6911 | 0.1972 | 8400 | 4.1293 | 0.7201 | 0.7201 | 0.7390 | 0.6186 | 0.619 | [[935, 61, 4], [297, 619, 84], [49, 332, 619]] | 0.935 | 0.619 | 0.619 |
63.4114 | 0.2019 | 8600 | 4.1279 | 0.6899 | 0.6899 | 0.7372 | 0.6263 | 0.501 | [[959, 41, 0], [327, 635, 38], [52, 447, 501]] | 0.959 | 0.635 | 0.501 |
63.312 | 0.2066 | 8800 | 4.1076 | 0.7273 | 0.7273 | 0.7630 | 0.6074 | 0.559 | [[944, 56, 0], [267, 692, 41], [34, 407, 559]] | 0.944 | 0.692 | 0.559 |
63.2253 | 0.2113 | 9000 | 4.0827 | 0.7207 | 0.7207 | 0.7399 | 0.5908 | 0.587 | [[955, 42, 3], [331, 587, 82], [45, 318, 637]] | 0.955 | 0.587 | 0.637 |
63.1791 | 0.2159 | 9200 | 4.1021 | 0.7144 | 0.7144 | 0.7459 | 0.6179 | 0.582 | [[957, 42, 1], [328, 621, 51], [58, 360, 582]] | 0.957 | 0.621 | 0.582 |
62.9892 | 0.2206 | 9400 | 4.0988 | 0.6994 | 0.6994 | 0.7354 | 0.6213 | 0.568 | [[965, 33, 2], [368, 586, 46], [59, 373, 568]] | 0.965 | 0.586 | 0.568 |
62.9596 | 0.2253 | 9600 | 4.0299 | 0.7349 | 0.7349 | 0.7547 | 0.5633 | 0.622 | [[953, 46, 1], [309, 622, 69], [33, 325, 642]] | 0.953 | 0.622 | 0.642 |
62.7207 | 0.2300 | 9800 | 4.1452 | 0.6911 | 0.6911 | 0.7252 | 0.6585 | 0.543 | [[966, 32, 2], [404, 543, 53], [64, 349, 587]] | 0.966 | 0.543 | 0.587 |
62.6702 | 0.2347 | 10000 | 4.0368 | 0.7287 | 0.7287 | 0.7537 | 0.5715 | 0.602 | [[947, 50, 3], [292, 649, 59], [32, 366, 602]] | 0.947 | 0.649 | 0.602 |
62.5446 | 0.2394 | 10200 | 4.0780 | 0.7032 | 0.7032 | 0.7319 | 0.6159 | 0.576 | [[962, 37, 1], [362, 576, 62], [44, 365, 591]] | 0.962 | 0.576 | 0.591 |
62.3668 | 0.2441 | 10400 | 4.1327 | 0.6841 | 0.6841 | 0.7314 | 0.6766 | 0.491 | [[959, 38, 3], [331, 629, 40], [39, 470, 491]] | 0.959 | 0.629 | 0.491 |
62.2441 | 0.2488 | 10600 | 4.0393 | 0.7446 | 0.7446 | 0.7581 | 0.5807 | 0.645 | [[951, 43, 6], [265, 645, 90], [30, 320, 650]] | 0.951 | 0.645 | 0.65 |
62.2008 | 0.2535 | 10800 | 3.9862 | 0.7569 | 0.7569 | 0.7644 | 0.5375 | 0.612 | [[952, 44, 4], [273, 612, 115], [30, 249, 721]] | 0.952 | 0.612 | 0.721 |
62.1253 | 0.2582 | 11000 | 4.0344 | 0.7083 | 0.7083 | 0.7336 | 0.6014 | 0.558 | [[964, 32, 4], [376, 558, 66], [49, 328, 623]] | 0.964 | 0.558 | 0.623 |
61.9099 | 0.2629 | 11200 | 4.0261 | 0.7336 | 0.7336 | 0.7616 | 0.5951 | 0.583 | [[937, 58, 5], [256, 692, 52], [40, 377, 583]] | 0.937 | 0.692 | 0.583 |
61.7465 | 0.2676 | 11400 | 4.1156 | 0.6506 | 0.6506 | 0.7096 | 0.6897 | 0.457 | [[967, 33, 0], [403, 563, 34], [41, 502, 457]] | 0.967 | 0.563 | 0.457 |
61.7948 | 0.2723 | 11600 | 4.0768 | 0.6828 | 0.6828 | 0.7276 | 0.6630 | 0.526 | [[963, 36, 1], [380, 583, 37], [49, 425, 526]] | 0.963 | 0.583 | 0.526 |
61.583 | 0.2770 | 11800 | 4.0815 | 0.6942 | 0.6942 | 0.7298 | 0.6597 | 0.57 | [[962, 37, 1], [379, 570, 51], [71, 357, 572]] | 0.962 | 0.57 | 0.572 |
61.4462 | 0.2817 | 12000 | 4.0190 | 0.7266 | 0.7266 | 0.7531 | 0.6087 | 0.596 | [[952, 47, 1], [297, 645, 58], [36, 368, 596]] | 0.952 | 0.645 | 0.596 |
61.1997 | 0.2864 | 12200 | 3.9755 | 0.7559 | 0.7559 | 0.7765 | 0.5705 | 0.623 | [[949, 48, 3], [234, 704, 62], [15, 362, 623]] | 0.949 | 0.704 | 0.623 |
61.2348 | 0.2911 | 12400 | 3.9294 | 0.7562 | 0.7562 | 0.7719 | 0.5367 | 0.657 | [[955, 43, 2], [271, 657, 72], [34, 299, 667]] | 0.955 | 0.657 | 0.667 |
61.0238 | 0.2958 | 12600 | 4.0619 | 0.6856 | 0.6856 | 0.7384 | 0.6624 | 0.473 | [[951, 49, 0], [305, 658, 37], [17, 510, 473]] | 0.951 | 0.658 | 0.473 |
61.0242 | 0.3005 | 12800 | 3.9927 | 0.6979 | 0.6979 | 0.7304 | 0.6148 | 0.571 | [[967, 31, 2], [368, 577, 55], [40, 389, 571]] | 0.967 | 0.577 | 0.571 |
60.8857 | 0.3051 | 13000 | 3.9335 | 0.7601 | 0.7601 | 0.7726 | 0.5507 | 0.668 | [[937, 59, 4], [234, 682, 84], [22, 310, 668]] | 0.937 | 0.682 | 0.668 |
60.7716 | 0.3098 | 13200 | 4.0305 | 0.7096 | 0.7096 | 0.7453 | 0.6505 | 0.565 | [[960, 39, 1], [337, 621, 42], [39, 396, 565]] | 0.96 | 0.621 | 0.565 |
60.6762 | 0.3145 | 13400 | 3.9114 | 0.7727 | 0.7727 | 0.7840 | 0.5417 | 0.686 | [[921, 77, 2], [203, 713, 84], [15, 299, 686]] | 0.921 | 0.713 | 0.686 |
60.5466 | 0.3192 | 13600 | 3.9843 | 0.7247 | 0.7247 | 0.7476 | 0.6131 | 0.601 | [[958, 41, 1], [296, 631, 73], [42, 357, 601]] | 0.958 | 0.631 | 0.601 |
60.5036 | 0.3239 | 13800 | 3.9504 | 0.7298 | 0.7298 | 0.7534 | 0.5888 | 0.593 | [[948, 49, 3], [271, 661, 68], [28, 379, 593]] | 0.948 | 0.661 | 0.593 |
60.3629 | 0.3286 | 14000 | 3.9785 | 0.7168 | 0.7168 | 0.7382 | 0.6287 | 0.544 | [[965, 33, 2], [381, 544, 75], [44, 294, 662]] | 0.965 | 0.544 | 0.662 |
60.259 | 0.3333 | 14200 | 3.8927 | 0.7358 | 0.7358 | 0.7532 | 0.5535 | 0.631 | [[943, 55, 2], [292, 631, 77], [30, 326, 644]] | 0.943 | 0.631 | 0.644 |
60.0233 | 0.3380 | 14400 | 3.9874 | 0.7001 | 0.7001 | 0.7249 | 0.6405 | 0.536 | [[968, 28, 4], [394, 536, 70], [32, 349, 619]] | 0.968 | 0.536 | 0.619 |
59.9092 | 0.3427 | 14600 | 3.9597 | 0.7173 | 0.7173 | 0.7462 | 0.6215 | 0.571 | [[943, 53, 4], [291, 651, 58], [23, 406, 571]] | 0.943 | 0.651 | 0.571 |
59.827 | 0.3474 | 14800 | 3.9515 | 0.7201 | 0.7201 | 0.7480 | 0.6193 | 0.589 | [[952, 46, 2], [309, 634, 57], [44, 367, 589]] | 0.952 | 0.634 | 0.589 |
59.7204 | 0.3521 | 15000 | 3.8741 | 0.7626 | 0.7626 | 0.7738 | 0.5534 | 0.666 | [[929, 65, 6], [210, 698, 92], [20, 314, 666]] | 0.929 | 0.698 | 0.666 |
59.667 | 0.3568 | 15200 | 3.9475 | 0.7208 | 0.7208 | 0.7517 | 0.6281 | 0.553 | [[940, 57, 3], [257, 683, 60], [23, 424, 553]] | 0.94 | 0.683 | 0.553 |
59.542 | 0.3615 | 15400 | 3.9530 | 0.6913 | 0.6913 | 0.7331 | 0.6452 | 0.532 | [[961, 39, 0], [357, 602, 41], [32, 436, 532]] | 0.961 | 0.602 | 0.532 |
59.4635 | 0.3662 | 15600 | 3.9513 | 0.7037 | 0.7037 | 0.7344 | 0.6504 | 0.582 | [[965, 32, 3], [360, 582, 58], [61, 354, 585]] | 0.965 | 0.582 | 0.585 |
59.2113 | 0.3709 | 15800 | 3.8991 | 0.7461 | 0.7461 | 0.7663 | 0.5940 | 0.632 | [[947, 51, 2], [269, 668, 63], [21, 347, 632]] | 0.947 | 0.668 | 0.632 |
59.2237 | 0.3756 | 16000 | 3.9010 | 0.7422 | 0.7422 | 0.7663 | 0.5978 | 0.611 | [[943, 55, 2], [263, 681, 56], [17, 372, 611]] | 0.943 | 0.681 | 0.611 |
59.0031 | 0.3803 | 16200 | 3.9380 | 0.7139 | 0.7139 | 0.7373 | 0.6298 | 0.573 | [[961, 37, 2], [357, 573, 70], [43, 331, 626]] | 0.961 | 0.573 | 0.626 |
59.0405 | 0.3850 | 16400 | 3.8858 | 0.7261 | 0.7261 | 0.7552 | 0.6002 | 0.583 | [[950, 48, 2], [290, 658, 52], [25, 392, 583]] | 0.95 | 0.658 | 0.583 |
58.8338 | 0.3896 | 16600 | 3.8715 | 0.7270 | 0.7270 | 0.7510 | 0.5955 | 0.595 | [[946, 51, 3], [280, 653, 67], [38, 367, 595]] | 0.946 | 0.653 | 0.595 |
58.8746 | 0.3943 | 16800 | 3.9528 | 0.7002 | 0.7002 | 0.7382 | 0.6745 | 0.553 | [[956, 42, 2], [349, 610, 41], [45, 402, 553]] | 0.956 | 0.61 | 0.553 |
58.6048 | 0.3990 | 17000 | 3.8640 | 0.7368 | 0.7368 | 0.7687 | 0.5965 | 0.575 | [[938, 61, 1], [248, 707, 45], [23, 402, 575]] | 0.938 | 0.707 | 0.575 |
58.4862 | 0.4037 | 17200 | 3.9524 | 0.6945 | 0.6945 | 0.7404 | 0.6891 | 0.511 | [[952, 46, 2], [323, 642, 35], [40, 449, 511]] | 0.952 | 0.642 | 0.511 |
58.4176 | 0.4084 | 17400 | 3.7888 | 0.7785 | 0.7785 | 0.7832 | 0.5213 | 0.658 | [[938, 56, 6], [232, 658, 110], [20, 232, 748]] | 0.938 | 0.658 | 0.748 |
58.3256 | 0.4131 | 17600 | 3.8355 | 0.7427 | 0.7427 | 0.7628 | 0.5795 | 0.626 | [[946, 52, 2], [266, 666, 68], [28, 346, 626]] | 0.946 | 0.666 | 0.626 |
58.193 | 0.4178 | 17800 | 3.9014 | 0.7114 | 0.7114 | 0.7458 | 0.6464 | 0.551 | [[951, 48, 1], [300, 649, 51], [37, 412, 551]] | 0.951 | 0.649 | 0.551 |
58.0348 | 0.4225 | 18000 | 3.9505 | 0.6624 | 0.6624 | 0.7268 | 0.7058 | 0.441 | [[961, 39, 0], [354, 620, 26], [35, 524, 441]] | 0.961 | 0.62 | 0.441 |
57.9114 | 0.4272 | 18200 | 3.9220 | 0.7076 | 0.7076 | 0.7400 | 0.6741 | 0.577 | [[964, 35, 1], [350, 600, 50], [37, 386, 577]] | 0.964 | 0.6 | 0.577 |
57.9024 | 0.4319 | 18400 | 3.8897 | 0.6970 | 0.6970 | 0.7320 | 0.6580 | 0.551 | [[964, 34, 2], [351, 597, 52], [35, 414, 551]] | 0.964 | 0.597 | 0.551 |
57.819 | 0.4366 | 18600 | 3.8781 | 0.7186 | 0.7186 | 0.7462 | 0.6470 | 0.587 | [[959, 39, 2], [315, 626, 59], [35, 378, 587]] | 0.959 | 0.626 | 0.587 |
57.7407 | 0.4413 | 18800 | 3.9488 | 0.6797 | 0.6797 | 0.7243 | 0.7260 | 0.498 | [[960, 38, 2], [345, 608, 47], [44, 458, 498]] | 0.96 | 0.608 | 0.498 |
57.5802 | 0.4460 | 19000 | 3.8100 | 0.7301 | 0.7301 | 0.7501 | 0.6038 | 0.619 | [[952, 46, 2], [293, 632, 75], [22, 359, 619]] | 0.952 | 0.632 | 0.619 |
57.4922 | 0.4507 | 19200 | 3.8570 | 0.7275 | 0.7275 | 0.7461 | 0.6478 | 0.616 | [[953, 46, 1], [301, 616, 83], [39, 333, 628]] | 0.953 | 0.616 | 0.628 |
57.3413 | 0.4554 | 19400 | 3.8799 | 0.6979 | 0.6979 | 0.7352 | 0.6764 | 0.539 | [[951, 47, 2], [328, 623, 49], [44, 417, 539]] | 0.951 | 0.623 | 0.539 |
57.2835 | 0.4601 | 19600 | 3.7841 | 0.7574 | 0.7574 | 0.7780 | 0.5797 | 0.618 | [[928, 69, 3], [200, 732, 68], [24, 358, 618]] | 0.928 | 0.732 | 0.618 |
57.0308 | 0.4648 | 19800 | 3.8292 | 0.7367 | 0.7367 | 0.7564 | 0.6354 | 0.617 | [[942, 54, 4], [266, 661, 73], [21, 362, 617]] | 0.942 | 0.661 | 0.617 |
56.9914 | 0.4695 | 20000 | 3.8292 | 0.7219 | 0.7219 | 0.7428 | 0.6366 | 0.601 | [[951, 46, 3], [324, 601, 75], [52, 319, 629]] | 0.951 | 0.601 | 0.629 |
56.868 | 0.4741 | 20200 | 3.8733 | 0.7062 | 0.7062 | 0.7367 | 0.6846 | 0.553 | [[946, 53, 1], [299, 636, 65], [34, 413, 553]] | 0.946 | 0.636 | 0.553 |
56.6672 | 0.4788 | 20400 | 3.8836 | 0.7048 | 0.7048 | 0.7404 | 0.6927 | 0.548 | [[955, 44, 1], [318, 631, 51], [58, 394, 548]] | 0.955 | 0.631 | 0.548 |
56.7036 | 0.4835 | 20600 | 3.8409 | 0.7219 | 0.7219 | 0.7546 | 0.6626 | 0.562 | [[952, 47, 1], [282, 668, 50], [49, 389, 562]] | 0.952 | 0.668 | 0.562 |
56.5975 | 0.4882 | 20800 | 3.8347 | 0.7205 | 0.7205 | 0.7446 | 0.6557 | 0.565 | [[968, 29, 3], [368, 565, 67], [61, 291, 648]] | 0.968 | 0.565 | 0.648 |
56.4269 | 0.4929 | 21000 | 3.7917 | 0.7427 | 0.7427 | 0.7637 | 0.6239 | 0.63 | [[954, 44, 2], [283, 655, 62], [26, 344, 630]] | 0.954 | 0.655 | 0.63 |
56.4064 | 0.4976 | 21200 | 3.8596 | 0.6763 | 0.6763 | 0.7197 | 0.7084 | 0.531 | [[979, 20, 1], [426, 531, 43], [73, 379, 548]] | 0.979 | 0.531 | 0.548 |
56.3241 | 0.5023 | 21400 | 3.7867 | 0.7141 | 0.7141 | 0.7496 | 0.6337 | 0.545 | [[952, 48, 0], [288, 662, 50], [31, 424, 545]] | 0.952 | 0.662 | 0.545 |
Framework versions
- Transformers 4.51.3
- Pytorch 2.6.0+cu126
- Datasets 3.5.1
- Tokenizers 0.21.1
- Downloads last month
- 24
Inference Providers
NEW
This model isn't deployed by any Inference Provider.
๐
Ask for provider support
Model tree for BioPhy/match_repo_o3_prod
Base model
answerdotai/ModernBERT-large