--- license: gemma base_model: google/gemma-2-2b tags: - trl - sft - generated_from_trainer model-index: - name: collapse_gemma-2-2b_hs2_accumulate_iter10_sftsd1 results: [] --- # collapse_gemma-2-2b_hs2_accumulate_iter10_sftsd1 This model is a fine-tuned version of [google/gemma-2-2b](https://huggingface.co/google/gemma-2-2b) on an unknown dataset. It achieves the following results on the evaluation set: - Loss: 1.0945 - Num Input Tokens Seen: 51175712 ## Model description More information needed ## Intended uses & limitations More information needed ## Training and evaluation data More information needed ## Training procedure ### Training hyperparameters The following hyperparameters were used during training: - learning_rate: 8e-06 - train_batch_size: 8 - eval_batch_size: 16 - seed: 1 - gradient_accumulation_steps: 16 - total_train_batch_size: 128 - optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08 - lr_scheduler_type: constant_with_warmup - lr_scheduler_warmup_ratio: 0.05 - num_epochs: 1 ### Training results | Training Loss | Epoch | Step | Validation Loss | Input Tokens Seen | |:-------------:|:------:|:----:|:---------------:|:-----------------:| | No log | 0 | 0 | 1.3909 | 0 | | 1.6385 | 0.0052 | 5 | 1.3879 | 271624 | | 1.6433 | 0.0105 | 10 | 1.3632 | 531728 | | 1.5049 | 0.0157 | 15 | 1.3069 | 799224 | | 1.4197 | 0.0210 | 20 | 1.2564 | 1066744 | | 1.3092 | 0.0262 | 25 | 1.2152 | 1340184 | | 1.2077 | 0.0315 | 30 | 1.1824 | 1611416 | | 1.0873 | 0.0367 | 35 | 1.1982 | 1861336 | | 1.0458 | 0.0420 | 40 | 1.2086 | 2129696 | | 0.8771 | 0.0472 | 45 | 1.2089 | 2398856 | | 0.6921 | 0.0525 | 50 | 1.2765 | 2665824 | | 0.5044 | 0.0577 | 55 | 1.2621 | 2935368 | | 0.5965 | 0.0630 | 60 | 1.2718 | 3202496 | | 0.4228 | 0.0682 | 65 | 1.2536 | 3470784 | | 0.3887 | 0.0734 | 70 | 1.2335 | 3734424 | | 0.3822 | 0.0787 | 75 | 1.2310 | 3998152 | | 0.4025 | 0.0839 | 80 | 1.2048 | 4272168 | | 0.3132 | 0.0892 | 85 | 1.2041 | 4536200 | | 0.3385 | 0.0944 | 90 | 1.2099 | 4799144 | | 0.2833 | 0.0997 | 95 | 1.1906 | 5072656 | | 0.2796 | 0.1049 | 100 | 1.1919 | 5344504 | | 0.1858 | 0.1102 | 105 | 1.1813 | 5610600 | | 0.249 | 0.1154 | 110 | 1.1853 | 5878120 | | 0.2275 | 0.1207 | 115 | 1.1839 | 6143552 | | 0.2511 | 0.1259 | 120 | 1.1824 | 6413392 | | 0.3556 | 0.1312 | 125 | 1.1811 | 6680192 | | 0.176 | 0.1364 | 130 | 1.1737 | 6941568 | | 0.2581 | 0.1416 | 135 | 1.1701 | 7205544 | | 0.222 | 0.1469 | 140 | 1.1711 | 7480072 | | 0.2517 | 0.1521 | 145 | 1.1659 | 7750744 | | 0.2425 | 0.1574 | 150 | 1.1641 | 8022208 | | 0.2457 | 0.1626 | 155 | 1.1649 | 8296160 | | 0.2867 | 0.1679 | 160 | 1.1597 | 8569848 | | 0.1405 | 0.1731 | 165 | 1.1626 | 8833768 | | 0.2254 | 0.1784 | 170 | 1.1618 | 9101328 | | 0.2241 | 0.1836 | 175 | 1.1544 | 9368632 | | 0.2379 | 0.1889 | 180 | 1.1580 | 9636496 | | 0.2245 | 0.1941 | 185 | 1.1540 | 9900576 | | 0.2203 | 0.1994 | 190 | 1.1510 | 10169840 | | 0.2859 | 0.2046 | 195 | 1.1524 | 10443184 | | 0.208 | 0.2098 | 200 | 1.1504 | 10715800 | | 0.2657 | 0.2151 | 205 | 1.1489 | 10982672 | | 0.1606 | 0.2203 | 210 | 1.1471 | 11257472 | | 0.1658 | 0.2256 | 215 | 1.1481 | 11522464 | | 0.2363 | 0.2308 | 220 | 1.1469 | 11787120 | | 0.1589 | 0.2361 | 225 | 1.1472 | 12053088 | | 0.1843 | 0.2413 | 230 | 1.1456 | 12329248 | | 0.2811 | 0.2466 | 235 | 1.1443 | 12596816 | | 0.2504 | 0.2518 | 240 | 1.1441 | 12865736 | | 0.2208 | 0.2571 | 245 | 1.1416 | 13136632 | | 0.219 | 0.2623 | 250 | 1.1414 | 13398592 | | 0.2519 | 0.2676 | 255 | 1.1409 | 13673896 | | 0.1821 | 0.2728 | 260 | 1.1376 | 13942448 | | 0.1376 | 0.2781 | 265 | 1.1420 | 14210040 | | 0.2355 | 0.2833 | 270 | 1.1373 | 14479896 | | 0.2076 | 0.2885 | 275 | 1.1361 | 14751016 | | 0.1938 | 0.2938 | 280 | 1.1406 | 15021448 | | 0.2384 | 0.2990 | 285 | 1.1335 | 15280872 | | 0.2672 | 0.3043 | 290 | 1.1346 | 15543056 | | 0.211 | 0.3095 | 295 | 1.1354 | 15810904 | | 0.2775 | 0.3148 | 300 | 1.1331 | 16080016 | | 0.126 | 0.3200 | 305 | 1.1321 | 16353688 | | 0.2124 | 0.3253 | 310 | 1.1323 | 16626304 | | 0.2067 | 0.3305 | 315 | 1.1290 | 16891864 | | 0.223 | 0.3358 | 320 | 1.1309 | 17161824 | | 0.219 | 0.3410 | 325 | 1.1325 | 17432392 | | 0.1981 | 0.3463 | 330 | 1.1281 | 17702632 | | 0.1413 | 0.3515 | 335 | 1.1288 | 17975384 | | 0.1306 | 0.3567 | 340 | 1.1287 | 18249784 | | 0.2086 | 0.3620 | 345 | 1.1287 | 18513992 | | 0.2131 | 0.3672 | 350 | 1.1257 | 18785208 | | 0.2322 | 0.3725 | 355 | 1.1279 | 19057760 | | 0.193 | 0.3777 | 360 | 1.1274 | 19326416 | | 0.2152 | 0.3830 | 365 | 1.1256 | 19589776 | | 0.1853 | 0.3882 | 370 | 1.1229 | 19859024 | | 0.152 | 0.3935 | 375 | 1.1260 | 20127728 | | 0.2626 | 0.3987 | 380 | 1.1228 | 20399736 | | 0.2866 | 0.4040 | 385 | 1.1207 | 20671496 | | 0.2188 | 0.4092 | 390 | 1.1238 | 20944784 | | 0.2403 | 0.4145 | 395 | 1.1215 | 21213824 | | 0.2303 | 0.4197 | 400 | 1.1219 | 21485816 | | 0.2451 | 0.4249 | 405 | 1.1208 | 21759368 | | 0.1682 | 0.4302 | 410 | 1.1191 | 22030608 | | 0.1945 | 0.4354 | 415 | 1.1202 | 22302928 | | 0.2122 | 0.4407 | 420 | 1.1206 | 22567912 | | 0.2038 | 0.4459 | 425 | 1.1179 | 22839344 | | 0.1775 | 0.4512 | 430 | 1.1189 | 23110192 | | 0.248 | 0.4564 | 435 | 1.1186 | 23385984 | | 0.1564 | 0.4617 | 440 | 1.1176 | 23656368 | | 0.2442 | 0.4669 | 445 | 1.1205 | 23925760 | | 0.1851 | 0.4722 | 450 | 1.1180 | 24192416 | | 0.2148 | 0.4774 | 455 | 1.1164 | 24455504 | | 0.1515 | 0.4827 | 460 | 1.1170 | 24721184 | | 0.1828 | 0.4879 | 465 | 1.1174 | 24990064 | | 0.2011 | 0.4931 | 470 | 1.1166 | 25255856 | | 0.2027 | 0.4984 | 475 | 1.1164 | 25523776 | | 0.1516 | 0.5036 | 480 | 1.1150 | 25790296 | | 0.2105 | 0.5089 | 485 | 1.1148 | 26052616 | | 0.1914 | 0.5141 | 490 | 1.1129 | 26319264 | | 0.2359 | 0.5194 | 495 | 1.1137 | 26593128 | | 0.1381 | 0.5246 | 500 | 1.1161 | 26862440 | | 0.1915 | 0.5299 | 505 | 1.1142 | 27123760 | | 0.1205 | 0.5351 | 510 | 1.1135 | 27392640 | | 0.2322 | 0.5404 | 515 | 1.1137 | 27664784 | | 0.151 | 0.5456 | 520 | 1.1116 | 27935984 | | 0.2365 | 0.5509 | 525 | 1.1115 | 28211288 | | 0.2168 | 0.5561 | 530 | 1.1144 | 28477568 | | 0.1178 | 0.5613 | 535 | 1.1119 | 28742552 | | 0.2171 | 0.5666 | 540 | 1.1114 | 29017040 | | 0.104 | 0.5718 | 545 | 1.1124 | 29287360 | | 0.2219 | 0.5771 | 550 | 1.1115 | 29554808 | | 0.2235 | 0.5823 | 555 | 1.1098 | 29820936 | | 0.2177 | 0.5876 | 560 | 1.1099 | 30088000 | | 0.176 | 0.5928 | 565 | 1.1100 | 30349872 | | 0.2121 | 0.5981 | 570 | 1.1088 | 30615816 | | 0.2045 | 0.6033 | 575 | 1.1084 | 30880216 | | 0.267 | 0.6086 | 580 | 1.1119 | 31144872 | | 0.1728 | 0.6138 | 585 | 1.1094 | 31411192 | | 0.1475 | 0.6191 | 590 | 1.1059 | 31675568 | | 0.2079 | 0.6243 | 595 | 1.1088 | 31946312 | | 0.2596 | 0.6295 | 600 | 1.1085 | 32220528 | | 0.1331 | 0.6348 | 605 | 1.1074 | 32485712 | | 0.2242 | 0.6400 | 610 | 1.1078 | 32752832 | | 0.1945 | 0.6453 | 615 | 1.1072 | 33018800 | | 0.1944 | 0.6505 | 620 | 1.1043 | 33286032 | | 0.1981 | 0.6558 | 625 | 1.1058 | 33559320 | | 0.2431 | 0.6610 | 630 | 1.1069 | 33827288 | | 0.2074 | 0.6663 | 635 | 1.1044 | 34093824 | | 0.1961 | 0.6715 | 640 | 1.1054 | 34358032 | | 0.1657 | 0.6768 | 645 | 1.1067 | 34625840 | | 0.1148 | 0.6820 | 650 | 1.1059 | 34887960 | | 0.2367 | 0.6873 | 655 | 1.1055 | 35159816 | | 0.2539 | 0.6925 | 660 | 1.1056 | 35427320 | | 0.1738 | 0.6978 | 665 | 1.1064 | 35700320 | | 0.158 | 0.7030 | 670 | 1.1057 | 35964016 | | 0.1366 | 0.7082 | 675 | 1.1048 | 36235568 | | 0.2311 | 0.7135 | 680 | 1.1053 | 36507520 | | 0.1222 | 0.7187 | 685 | 1.1042 | 36772320 | | 0.1399 | 0.7240 | 690 | 1.1031 | 37040632 | | 0.172 | 0.7292 | 695 | 1.1030 | 37303152 | | 0.2098 | 0.7345 | 700 | 1.1059 | 37574576 | | 0.1788 | 0.7397 | 705 | 1.1047 | 37848808 | | 0.1323 | 0.7450 | 710 | 1.1021 | 38114488 | | 0.2065 | 0.7502 | 715 | 1.1008 | 38388584 | | 0.1683 | 0.7555 | 720 | 1.1033 | 38657616 | | 0.2276 | 0.7607 | 725 | 1.1036 | 38926072 | | 0.2007 | 0.7660 | 730 | 1.1019 | 39197256 | | 0.196 | 0.7712 | 735 | 1.1004 | 39466864 | | 0.1794 | 0.7764 | 740 | 1.1041 | 39737096 | | 0.1614 | 0.7817 | 745 | 1.1046 | 40005096 | | 0.2611 | 0.7869 | 750 | 1.1013 | 40271312 | | 0.1707 | 0.7922 | 755 | 1.1014 | 40537096 | | 0.1234 | 0.7974 | 760 | 1.1021 | 40798272 | | 0.1902 | 0.8027 | 765 | 1.1026 | 41068576 | | 0.2074 | 0.8079 | 770 | 1.1006 | 41333440 | | 0.1535 | 0.8132 | 775 | 1.1004 | 41596272 | | 0.2085 | 0.8184 | 780 | 1.1006 | 41867760 | | 0.1914 | 0.8237 | 785 | 1.1007 | 42135872 | | 0.1402 | 0.8289 | 790 | 1.1004 | 42405584 | | 0.1844 | 0.8342 | 795 | 1.1001 | 42668992 | | 0.2101 | 0.8394 | 800 | 1.0976 | 42936872 | | 0.1892 | 0.8446 | 805 | 1.0993 | 43203248 | | 0.2207 | 0.8499 | 810 | 1.1008 | 43470648 | | 0.1441 | 0.8551 | 815 | 1.0994 | 43739272 | | 0.146 | 0.8604 | 820 | 1.0985 | 44009920 | | 0.1725 | 0.8656 | 825 | 1.0992 | 44274912 | | 0.1492 | 0.8709 | 830 | 1.1002 | 44546640 | | 0.2031 | 0.8761 | 835 | 1.0984 | 44810120 | | 0.2081 | 0.8814 | 840 | 1.0982 | 45079088 | | 0.1331 | 0.8866 | 845 | 1.0996 | 45351432 | | 0.1989 | 0.8919 | 850 | 1.0978 | 45611400 | | 0.1079 | 0.8971 | 855 | 1.0967 | 45874904 | | 0.2258 | 0.9024 | 860 | 1.0979 | 46145128 | | 0.1287 | 0.9076 | 865 | 1.0974 | 46410800 | | 0.1404 | 0.9128 | 870 | 1.0974 | 46678552 | | 0.1972 | 0.9181 | 875 | 1.0967 | 46939000 | | 0.2395 | 0.9233 | 880 | 1.0958 | 47221520 | | 0.1464 | 0.9286 | 885 | 1.0970 | 47499040 | | 0.1881 | 0.9338 | 890 | 1.0965 | 47765808 | | 0.1543 | 0.9391 | 895 | 1.0971 | 48035152 | | 0.1311 | 0.9443 | 900 | 1.0966 | 48303032 | | 0.1793 | 0.9496 | 905 | 1.0966 | 48574536 | | 0.1552 | 0.9548 | 910 | 1.0959 | 48856360 | | 0.1798 | 0.9601 | 915 | 1.0976 | 49126944 | | 0.1749 | 0.9653 | 920 | 1.0967 | 49397832 | | 0.157 | 0.9706 | 925 | 1.0939 | 49671648 | | 0.1835 | 0.9758 | 930 | 1.0943 | 49936592 | | 0.2019 | 0.9810 | 935 | 1.0973 | 50203752 | | 0.1426 | 0.9863 | 940 | 1.0959 | 50476704 | | 0.132 | 0.9915 | 945 | 1.0961 | 50742304 | | 0.2386 | 0.9968 | 950 | 1.0962 | 51013336 | ### Framework versions - Transformers 4.44.0 - Pytorch 2.4.0+cu121 - Datasets 2.20.0 - Tokenizers 0.19.1