Se124M100KInfPrompt_WT_EOS_Label_Smooth

This model is a fine-tuned version of gpt2 on an unknown dataset. It achieves the following results on the evaluation set:

  • Loss: 2.1617

Model description

More information needed

Intended uses & limitations

More information needed

Training and evaluation data

More information needed

Training procedure

Training hyperparameters

The following hyperparameters were used during training:

  • learning_rate: 2e-05
  • train_batch_size: 16
  • eval_batch_size: 16
  • seed: 42
  • gradient_accumulation_steps: 4
  • total_train_batch_size: 64
  • optimizer: Use OptimizerNames.ADAMW_TORCH_FUSED with betas=(0.9,0.999) and epsilon=1e-08 and optimizer_args=No additional optimizer arguments
  • lr_scheduler_type: cosine
  • lr_scheduler_warmup_ratio: 0.03
  • num_epochs: 5
  • label_smoothing_factor: 0.1

Training results

Training Loss Epoch Step Validation Loss
19.0108 0.0164 20 4.5546
18.9846 0.0327 40 4.5496
18.9358 0.0491 60 4.5317
18.9041 0.0655 80 4.5064
18.5139 0.0819 100 4.4510
18.2552 0.0982 120 4.3778
17.9591 0.1146 140 4.2887
17.322 0.1310 160 4.1740
16.8904 0.1474 180 4.0367
16.3008 0.1637 200 3.8727
15.7206 0.1801 220 3.7338
15.0629 0.1965 240 3.5839
14.5732 0.2129 260 3.4582
14.0294 0.2292 280 3.3334
13.5401 0.2456 300 3.2152
13.2086 0.2620 320 3.1111
12.8368 0.2783 340 3.0145
12.4193 0.2947 360 2.9264
11.9977 0.3111 380 2.8435
11.7412 0.3275 400 2.7679
11.5111 0.3438 420 2.6997
11.2634 0.3602 440 2.6409
11.0944 0.3766 460 2.5882
10.8847 0.3930 480 2.5459
10.6994 0.4093 500 2.5115
10.5561 0.4257 520 2.4840
10.4584 0.4421 540 2.4625
10.3285 0.4585 560 2.4426
10.2707 0.4748 580 2.4283
10.124 0.4912 600 2.4124
10.0467 0.5076 620 2.3997
9.955 0.5239 640 2.3867
9.893 0.5403 660 2.3739
9.8885 0.5567 680 2.3627
9.8025 0.5731 700 2.3534
9.7378 0.5894 720 2.3436
9.6593 0.6058 740 2.3343
9.6328 0.6222 760 2.3285
9.585 0.6386 780 2.3200
9.5782 0.6549 800 2.3149
9.5723 0.6713 820 2.3079
9.4824 0.6877 840 2.3040
9.4909 0.7041 860 2.2979
9.4709 0.7204 880 2.2943
9.4306 0.7368 900 2.2877
9.4688 0.7532 920 2.2841
9.4184 0.7695 940 2.2805
9.3729 0.7859 960 2.2780
9.3634 0.8023 980 2.2768
9.3779 0.8187 1000 2.2718
9.3945 0.8350 1020 2.2683
9.3539 0.8514 1040 2.2668
9.2679 0.8678 1060 2.2614
9.2974 0.8842 1080 2.2592
9.2907 0.9005 1100 2.2591
9.2787 0.9169 1120 2.2559
9.3063 0.9333 1140 2.2561
9.2133 0.9497 1160 2.2538
9.2134 0.9660 1180 2.2497
9.2464 0.9824 1200 2.2472
9.1947 0.9988 1220 2.2476
8.9748 1.0147 1240 2.2448
9.209 1.0311 1260 2.2441
9.1883 1.0475 1280 2.2431
9.1606 1.0639 1300 2.2415
9.1384 1.0802 1320 2.2379
9.1674 1.0966 1340 2.2368
9.144 1.1130 1360 2.2347
9.095 1.1293 1380 2.2343
9.0968 1.1457 1400 2.2331
9.1172 1.1621 1420 2.2303
9.084 1.1785 1440 2.2304
9.1447 1.1948 1460 2.2274
9.0922 1.2112 1480 2.2261
9.0842 1.2276 1500 2.2258
9.0883 1.2440 1520 2.2258
9.0633 1.2603 1540 2.2256
9.065 1.2767 1560 2.2211
9.0941 1.2931 1580 2.2225
9.071 1.3095 1600 2.2219
9.0327 1.3258 1620 2.2205
9.0414 1.3422 1640 2.2183
9.0465 1.3586 1660 2.2171
9.0607 1.3749 1680 2.2179
9.0418 1.3913 1700 2.2150
9.0179 1.4077 1720 2.2146
9.0646 1.4241 1740 2.2160
8.9973 1.4404 1760 2.2140
9.0051 1.4568 1780 2.2124
9.023 1.4732 1800 2.2136
8.9645 1.4896 1820 2.2082
9.0326 1.5059 1840 2.2099
9.009 1.5223 1860 2.2081
9.0208 1.5387 1880 2.2100
8.9923 1.5551 1900 2.2074
9.0036 1.5714 1920 2.2067
9.0372 1.5878 1940 2.2050
8.9954 1.6042 1960 2.2058
8.9362 1.6205 1980 2.2063
8.9638 1.6369 2000 2.2049
8.9343 1.6533 2020 2.2022
8.9586 1.6697 2040 2.2018
9.0058 1.6860 2060 2.2022
8.9595 1.7024 2080 2.2002
8.9547 1.7188 2100 2.1979
8.9423 1.7352 2120 2.1992
8.9637 1.7515 2140 2.1980
8.9599 1.7679 2160 2.1984
8.9396 1.7843 2180 2.1964
8.9515 1.8007 2200 2.1947
8.9479 1.8170 2220 2.1962
8.9487 1.8334 2240 2.1938
8.9059 1.8498 2260 2.1944
8.9323 1.8661 2280 2.1948
8.9462 1.8825 2300 2.1946
8.9453 1.8989 2320 2.1931
8.8958 1.9153 2340 2.1918
8.9608 1.9316 2360 2.1924
8.8996 1.9480 2380 2.1898
8.9414 1.9644 2400 2.1890
8.9095 1.9808 2420 2.1883
8.8899 1.9971 2440 2.1882
8.645 2.0131 2460 2.1876
8.9385 2.0295 2480 2.1892
8.8846 2.0458 2500 2.1876
8.8618 2.0622 2520 2.1868
8.9023 2.0786 2540 2.1857
8.9133 2.0950 2560 2.1849
8.9055 2.1113 2580 2.1854
8.891 2.1277 2600 2.1843
8.9237 2.1441 2620 2.1849
8.887 2.1605 2640 2.1835
8.9018 2.1768 2660 2.1825
8.9009 2.1932 2680 2.1831
8.8959 2.2096 2700 2.1842
8.8711 2.2260 2720 2.1823
8.891 2.2423 2740 2.1811
8.8813 2.2587 2760 2.1805
8.8852 2.2751 2780 2.1817
8.8702 2.2914 2800 2.1816
8.8775 2.3078 2820 2.1779
8.867 2.3242 2840 2.1802
8.8928 2.3406 2860 2.1781
8.9039 2.3569 2880 2.1784
8.8728 2.3733 2900 2.1798
8.8428 2.3897 2920 2.1774
8.8585 2.4061 2940 2.1786
8.879 2.4224 2960 2.1765
8.8633 2.4388 2980 2.1768
8.8498 2.4552 3000 2.1766
8.8998 2.4716 3020 2.1757
8.8642 2.4879 3040 2.1746
8.8752 2.5043 3060 2.1777
8.8417 2.5207 3080 2.1765
8.8695 2.5370 3100 2.1772
8.8683 2.5534 3120 2.1771
8.8323 2.5698 3140 2.1767
8.8448 2.5862 3160 2.1769
8.8549 2.6025 3180 2.1762
8.8315 2.6189 3200 2.1728
8.8652 2.6353 3220 2.1766
8.8402 2.6517 3240 2.1766
8.8491 2.6680 3260 2.1740
8.8438 2.6844 3280 2.1761
8.8378 2.7008 3300 2.1749
8.8587 2.7172 3320 2.1758
8.8655 2.7335 3340 2.1738
8.8079 2.7499 3360 2.1741
8.8234 2.7663 3380 2.1734
8.8389 2.7826 3400 2.1737
8.8085 2.7990 3420 2.1727
8.8397 2.8154 3440 2.1716
8.8679 2.8318 3460 2.1725
8.8381 2.8481 3480 2.1711
8.8267 2.8645 3500 2.1731
8.8671 2.8809 3520 2.1710
8.8439 2.8973 3540 2.1707
8.8276 2.9136 3560 2.1715
8.8624 2.9300 3580 2.1717
8.8096 2.9464 3600 2.1719
8.8429 2.9628 3620 2.1711
8.8152 2.9791 3640 2.1719
8.7951 2.9955 3660 2.1718
8.6203 3.0115 3680 2.1699
8.8258 3.0278 3700 2.1702
8.8285 3.0442 3720 2.1690
8.8504 3.0606 3740 2.1696
8.8282 3.0770 3760 2.1683
8.8457 3.0933 3780 2.1687
8.8096 3.1097 3800 2.1696
8.8035 3.1261 3820 2.1692
8.8099 3.1424 3840 2.1695
8.7912 3.1588 3860 2.1690
8.8371 3.1752 3880 2.1675
8.8418 3.1916 3900 2.1696
8.821 3.2079 3920 2.1685
8.7993 3.2243 3940 2.1673
8.7873 3.2407 3960 2.1680
8.7995 3.2571 3980 2.1672
8.7745 3.2734 4000 2.1669
8.8271 3.2898 4020 2.1682
8.8021 3.3062 4040 2.1670
8.8327 3.3226 4060 2.1669
8.8031 3.3389 4080 2.1676
8.7912 3.3553 4100 2.1670
8.8087 3.3717 4120 2.1669
8.8377 3.3880 4140 2.1677
8.8045 3.4044 4160 2.1674
8.7921 3.4208 4180 2.1663
8.8128 3.4372 4200 2.1670
8.8479 3.4535 4220 2.1668
8.8072 3.4699 4240 2.1668
8.7718 3.4863 4260 2.1665
8.8012 3.5027 4280 2.1666
8.809 3.5190 4300 2.1666
8.8306 3.5354 4320 2.1653
8.8264 3.5518 4340 2.1654
8.8202 3.5682 4360 2.1651
8.793 3.5845 4380 2.1643
8.8171 3.6009 4400 2.1647
8.8277 3.6173 4420 2.1643
8.8055 3.6336 4440 2.1650
8.7796 3.6500 4460 2.1651
8.8176 3.6664 4480 2.1645
8.7721 3.6828 4500 2.1651
8.7966 3.6991 4520 2.1649
8.841 3.7155 4540 2.1649
8.8044 3.7319 4560 2.1641
8.7891 3.7483 4580 2.1638
8.7594 3.7646 4600 2.1639
8.7963 3.7810 4620 2.1636
8.8074 3.7974 4640 2.1638
8.8025 3.8138 4660 2.1641
8.8361 3.8301 4680 2.1635
8.8129 3.8465 4700 2.1641
8.7971 3.8629 4720 2.1642
8.8033 3.8792 4740 2.1639
8.78 3.8956 4760 2.1637
8.8012 3.9120 4780 2.1638
8.8109 3.9284 4800 2.1633
8.8402 3.9447 4820 2.1634
8.7862 3.9611 4840 2.1634
8.8204 3.9775 4860 2.1630
8.8033 3.9939 4880 2.1637
8.5571 4.0098 4900 2.1625
8.8107 4.0262 4920 2.1628
8.8381 4.0426 4940 2.1637
8.7981 4.0589 4960 2.1626
8.7817 4.0753 4980 2.1626
8.7938 4.0917 5000 2.1635
8.8026 4.1081 5020 2.1638
8.7924 4.1244 5040 2.1622
8.8206 4.1408 5060 2.1629
8.7942 4.1572 5080 2.1633
8.7939 4.1736 5100 2.1627
8.8211 4.1899 5120 2.1622
8.7513 4.2063 5140 2.1630
8.79 4.2227 5160 2.1635
8.8063 4.2391 5180 2.1631
8.8049 4.2554 5200 2.1626
8.8196 4.2718 5220 2.1627
8.8215 4.2882 5240 2.1631
8.798 4.3045 5260 2.1634
8.7946 4.3209 5280 2.1621
8.7797 4.3373 5300 2.1619
8.8163 4.3537 5320 2.1627
8.7569 4.3700 5340 2.1621
8.7671 4.3864 5360 2.1629
8.7883 4.4028 5380 2.1628
8.7788 4.4192 5400 2.1628
8.7826 4.4355 5420 2.1621
8.7884 4.4519 5440 2.1622
8.8011 4.4683 5460 2.1628
8.796 4.4847 5480 2.1629
8.7943 4.5010 5500 2.1627
8.8184 4.5174 5520 2.1618
8.7747 4.5338 5540 2.1629
8.784 4.5501 5560 2.1630
8.8176 4.5665 5580 2.1628
8.8134 4.5829 5600 2.1624
8.7711 4.5993 5620 2.1629
8.7939 4.6156 5640 2.1631
8.8057 4.6320 5660 2.1631
8.8042 4.6484 5680 2.1623
8.8248 4.6648 5700 2.1624
8.7954 4.6811 5720 2.1626
8.7767 4.6975 5740 2.1622
8.7603 4.7139 5760 2.1631
8.8185 4.7302 5780 2.1632
8.7975 4.7466 5800 2.1629
8.7933 4.7630 5820 2.1627
8.7949 4.7794 5840 2.1627
8.7701 4.7957 5860 2.1629
8.7875 4.8121 5880 2.1628
8.7731 4.8285 5900 2.1627
8.8287 4.8449 5920 2.1629
8.7871 4.8612 5940 2.1626
8.7655 4.8776 5960 2.1627
8.7744 4.8940 5980 2.1629
8.764 4.9104 6000 2.1618
8.8085 4.9267 6020 2.1627
8.7985 4.9431 6040 2.1629
8.8205 4.9595 6060 2.1631
8.866 4.9758 6080 2.1619
8.785 4.9922 6100 2.1617

Framework versions

  • PEFT 0.15.1
  • Transformers 4.51.3
  • Pytorch 2.6.0+cu118
  • Datasets 3.5.0
  • Tokenizers 0.21.1
Downloads last month
29
Inference Providers NEW
This model isn't deployed by any Inference Provider. ๐Ÿ™‹ Ask for provider support

Model tree for augustocsc/Se124M100KInfPrompt_WT_EOS_Label_Smooth

Adapter
(1666)
this model