collapse_gemma-2-2b_hs2_accumulate_iter19_sftsd1
This model is a fine-tuned version of google/gemma-2-2b on an unknown dataset. It achieves the following results on the evaluation set:
- Loss: 1.0989
- Num Input Tokens Seen: 97347672
Model description
More information needed
Intended uses & limitations
More information needed
Training and evaluation data
More information needed
Training procedure
Training hyperparameters
The following hyperparameters were used during training:
- learning_rate: 8e-06
- train_batch_size: 8
- eval_batch_size: 16
- seed: 1
- gradient_accumulation_steps: 16
- total_train_batch_size: 128
- optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
- lr_scheduler_type: constant_with_warmup
- lr_scheduler_warmup_ratio: 0.05
- num_epochs: 1
Training results
Training Loss | Epoch | Step | Validation Loss | Input Tokens Seen |
---|---|---|---|---|
No log | 0 | 0 | 1.3909 | 0 |
1.7174 | 0.0027 | 5 | 1.3909 | 266872 |
1.6111 | 0.0055 | 10 | 1.3836 | 533480 |
1.6539 | 0.0082 | 15 | 1.3631 | 798752 |
1.5642 | 0.0109 | 20 | 1.3338 | 1064288 |
1.6072 | 0.0137 | 25 | 1.2943 | 1338000 |
1.3696 | 0.0164 | 30 | 1.2521 | 1599120 |
1.3166 | 0.0191 | 35 | 1.2246 | 1860216 |
1.2964 | 0.0219 | 40 | 1.1980 | 2128840 |
1.1507 | 0.0246 | 45 | 1.1899 | 2388008 |
1.0819 | 0.0273 | 50 | 1.2120 | 2646856 |
0.9372 | 0.0301 | 55 | 1.2402 | 2912968 |
0.8892 | 0.0328 | 60 | 1.2765 | 3168864 |
0.66 | 0.0355 | 65 | 1.3017 | 3441624 |
0.6357 | 0.0383 | 70 | 1.2970 | 3709896 |
0.5769 | 0.0410 | 75 | 1.3026 | 3977528 |
0.462 | 0.0437 | 80 | 1.3114 | 4243944 |
0.4417 | 0.0465 | 85 | 1.3083 | 4504216 |
0.3348 | 0.0492 | 90 | 1.2831 | 4774880 |
0.3377 | 0.0519 | 95 | 1.2634 | 5037776 |
0.2137 | 0.0547 | 100 | 1.2311 | 5309968 |
0.2449 | 0.0574 | 105 | 1.2508 | 5576496 |
0.2124 | 0.0601 | 110 | 1.2161 | 5841760 |
0.143 | 0.0629 | 115 | 1.2250 | 6108464 |
0.1943 | 0.0656 | 120 | 1.2239 | 6379128 |
0.2212 | 0.0683 | 125 | 1.2073 | 6641224 |
0.2491 | 0.0711 | 130 | 1.2136 | 6907672 |
0.1946 | 0.0738 | 135 | 1.1992 | 7171984 |
0.2037 | 0.0765 | 140 | 1.2021 | 7442496 |
0.1578 | 0.0793 | 145 | 1.1970 | 7705752 |
0.167 | 0.0820 | 150 | 1.2024 | 7974728 |
0.1721 | 0.0847 | 155 | 1.1955 | 8241928 |
0.1745 | 0.0875 | 160 | 1.1828 | 8512416 |
0.1834 | 0.0902 | 165 | 1.1924 | 8783416 |
0.195 | 0.0930 | 170 | 1.1873 | 9051112 |
0.1586 | 0.0957 | 175 | 1.1778 | 9320688 |
0.253 | 0.0984 | 180 | 1.1801 | 9584016 |
0.1373 | 0.1012 | 185 | 1.1771 | 9852488 |
0.1344 | 0.1039 | 190 | 1.1859 | 10123040 |
0.1969 | 0.1066 | 195 | 1.1784 | 10395672 |
0.1556 | 0.1094 | 200 | 1.1778 | 10667736 |
0.1264 | 0.1121 | 205 | 1.1808 | 10931120 |
0.1954 | 0.1148 | 210 | 1.1764 | 11198256 |
0.1253 | 0.1176 | 215 | 1.1737 | 11460848 |
0.1299 | 0.1203 | 220 | 1.1787 | 11728464 |
0.1308 | 0.1230 | 225 | 1.1751 | 11996864 |
0.1496 | 0.1258 | 230 | 1.1661 | 12265288 |
0.1035 | 0.1285 | 235 | 1.1726 | 12531848 |
0.1098 | 0.1312 | 240 | 1.1701 | 12801200 |
0.1262 | 0.1340 | 245 | 1.1676 | 13069368 |
0.1547 | 0.1367 | 250 | 1.1687 | 13337576 |
0.112 | 0.1394 | 255 | 1.1666 | 13599152 |
0.1908 | 0.1422 | 260 | 1.1635 | 13861552 |
0.1643 | 0.1449 | 265 | 1.1591 | 14125760 |
0.1394 | 0.1476 | 270 | 1.1643 | 14394032 |
0.1175 | 0.1504 | 275 | 1.1596 | 14666560 |
0.1004 | 0.1531 | 280 | 1.1663 | 14930664 |
0.0763 | 0.1558 | 285 | 1.1717 | 15192168 |
0.1751 | 0.1586 | 290 | 1.1629 | 15452408 |
0.0951 | 0.1613 | 295 | 1.1617 | 15718392 |
0.1146 | 0.1640 | 300 | 1.1615 | 15986280 |
0.148 | 0.1668 | 305 | 1.1575 | 16250824 |
0.1276 | 0.1695 | 310 | 1.1590 | 16517136 |
0.1763 | 0.1722 | 315 | 1.1576 | 16782864 |
0.1252 | 0.1750 | 320 | 1.1569 | 17051768 |
0.1352 | 0.1777 | 325 | 1.1591 | 17326792 |
0.114 | 0.1804 | 330 | 1.1520 | 17591712 |
0.2198 | 0.1832 | 335 | 1.1503 | 17856864 |
0.1552 | 0.1859 | 340 | 1.1514 | 18117776 |
0.1374 | 0.1886 | 345 | 1.1528 | 18380616 |
0.112 | 0.1914 | 350 | 1.1516 | 18651352 |
0.1037 | 0.1941 | 355 | 1.1473 | 18919872 |
0.0999 | 0.1968 | 360 | 1.1535 | 19188072 |
0.1332 | 0.1996 | 365 | 1.1504 | 19453480 |
0.1847 | 0.2023 | 370 | 1.1541 | 19721152 |
0.071 | 0.2050 | 375 | 1.1536 | 19991456 |
0.1044 | 0.2078 | 380 | 1.1497 | 20254144 |
0.0751 | 0.2105 | 385 | 1.1494 | 20519720 |
0.0616 | 0.2132 | 390 | 1.1556 | 20783688 |
0.1662 | 0.2160 | 395 | 1.1532 | 21044896 |
0.1196 | 0.2187 | 400 | 1.1444 | 21312752 |
0.1694 | 0.2214 | 405 | 1.1478 | 21578272 |
0.2183 | 0.2242 | 410 | 1.1483 | 21848096 |
0.1136 | 0.2269 | 415 | 1.1427 | 22107584 |
0.1043 | 0.2296 | 420 | 1.1470 | 22363472 |
0.149 | 0.2324 | 425 | 1.1466 | 22638288 |
0.0572 | 0.2351 | 430 | 1.1492 | 22906352 |
0.0534 | 0.2378 | 435 | 1.1495 | 23162496 |
0.1239 | 0.2406 | 440 | 1.1466 | 23431224 |
0.1872 | 0.2433 | 445 | 1.1444 | 23701840 |
0.1271 | 0.2460 | 450 | 1.1474 | 23967120 |
0.0885 | 0.2488 | 455 | 1.1438 | 24233048 |
0.1312 | 0.2515 | 460 | 1.1429 | 24501112 |
0.1024 | 0.2542 | 465 | 1.1422 | 24771456 |
0.0965 | 0.2570 | 470 | 1.1436 | 25043256 |
0.0869 | 0.2597 | 475 | 1.1468 | 25308600 |
0.1391 | 0.2624 | 480 | 1.1416 | 25578912 |
0.1503 | 0.2652 | 485 | 1.1383 | 25846392 |
0.1101 | 0.2679 | 490 | 1.1464 | 26120528 |
0.1043 | 0.2706 | 495 | 1.1441 | 26383104 |
0.1182 | 0.2734 | 500 | 1.1391 | 26651176 |
0.0894 | 0.2761 | 505 | 1.1393 | 26915624 |
0.1544 | 0.2789 | 510 | 1.1397 | 27177368 |
0.1456 | 0.2816 | 515 | 1.1360 | 27447576 |
0.1584 | 0.2843 | 520 | 1.1373 | 27713256 |
0.099 | 0.2871 | 525 | 1.1432 | 27973272 |
0.1098 | 0.2898 | 530 | 1.1403 | 28228976 |
0.1108 | 0.2925 | 535 | 1.1378 | 28490496 |
0.1822 | 0.2953 | 540 | 1.1367 | 28765928 |
0.1631 | 0.2980 | 545 | 1.1340 | 29034624 |
0.1329 | 0.3007 | 550 | 1.1336 | 29302688 |
0.1422 | 0.3035 | 555 | 1.1334 | 29564784 |
0.1367 | 0.3062 | 560 | 1.1344 | 29840384 |
0.0854 | 0.3089 | 565 | 1.1351 | 30106576 |
0.1498 | 0.3117 | 570 | 1.1309 | 30378728 |
0.1591 | 0.3144 | 575 | 1.1319 | 30647712 |
0.1223 | 0.3171 | 580 | 1.1338 | 30917512 |
0.127 | 0.3199 | 585 | 1.1334 | 31187240 |
0.1535 | 0.3226 | 590 | 1.1310 | 31450016 |
0.1413 | 0.3253 | 595 | 1.1323 | 31715368 |
0.155 | 0.3281 | 600 | 1.1366 | 31979272 |
0.1003 | 0.3308 | 605 | 1.1341 | 32244656 |
0.0931 | 0.3335 | 610 | 1.1303 | 32510112 |
0.1586 | 0.3363 | 615 | 1.1299 | 32775624 |
0.1098 | 0.3390 | 620 | 1.1318 | 33045312 |
0.1126 | 0.3417 | 625 | 1.1315 | 33315344 |
0.1441 | 0.3445 | 630 | 1.1288 | 33577168 |
0.1377 | 0.3472 | 635 | 1.1275 | 33845000 |
0.1506 | 0.3499 | 640 | 1.1302 | 34113832 |
0.1645 | 0.3527 | 645 | 1.1304 | 34381504 |
0.1134 | 0.3554 | 650 | 1.1293 | 34647712 |
0.0611 | 0.3581 | 655 | 1.1297 | 34917120 |
0.1077 | 0.3609 | 660 | 1.1313 | 35184000 |
0.0922 | 0.3636 | 665 | 1.1290 | 35451320 |
0.1285 | 0.3663 | 670 | 1.1254 | 35713528 |
0.1775 | 0.3691 | 675 | 1.1269 | 35979648 |
0.1231 | 0.3718 | 680 | 1.1264 | 36251280 |
0.0869 | 0.3745 | 685 | 1.1267 | 36523080 |
0.0755 | 0.3773 | 690 | 1.1253 | 36792800 |
0.1084 | 0.3800 | 695 | 1.1256 | 37061576 |
0.165 | 0.3827 | 700 | 1.1245 | 37327536 |
0.0832 | 0.3855 | 705 | 1.1248 | 37595088 |
0.1449 | 0.3882 | 710 | 1.1255 | 37853128 |
0.1484 | 0.3909 | 715 | 1.1254 | 38118408 |
0.09 | 0.3937 | 720 | 1.1238 | 38389056 |
0.113 | 0.3964 | 725 | 1.1243 | 38660160 |
0.1209 | 0.3991 | 730 | 1.1241 | 38933408 |
0.1019 | 0.4019 | 735 | 1.1241 | 39203408 |
0.0974 | 0.4046 | 740 | 1.1241 | 39472776 |
0.1233 | 0.4073 | 745 | 1.1244 | 39735160 |
0.101 | 0.4101 | 750 | 1.1258 | 39997392 |
0.1051 | 0.4128 | 755 | 1.1240 | 40264792 |
0.1002 | 0.4155 | 760 | 1.1228 | 40531920 |
0.1039 | 0.4183 | 765 | 1.1228 | 40798440 |
0.1661 | 0.4210 | 770 | 1.1213 | 41065784 |
0.1615 | 0.4237 | 775 | 1.1208 | 41333064 |
0.1364 | 0.4265 | 780 | 1.1209 | 41604728 |
0.1646 | 0.4292 | 785 | 1.1200 | 41872328 |
0.1234 | 0.4319 | 790 | 1.1210 | 42139912 |
0.1313 | 0.4347 | 795 | 1.1219 | 42397864 |
0.1858 | 0.4374 | 800 | 1.1201 | 42663312 |
0.1786 | 0.4401 | 805 | 1.1196 | 42928792 |
0.1706 | 0.4429 | 810 | 1.1215 | 43193840 |
0.1255 | 0.4456 | 815 | 1.1221 | 43457720 |
0.1275 | 0.4483 | 820 | 1.1205 | 43721200 |
0.1364 | 0.4511 | 825 | 1.1197 | 43981648 |
0.1311 | 0.4538 | 830 | 1.1208 | 44243536 |
0.1631 | 0.4565 | 835 | 1.1188 | 44508888 |
0.0923 | 0.4593 | 840 | 1.1175 | 44771440 |
0.1563 | 0.4620 | 845 | 1.1187 | 45042352 |
0.1156 | 0.4648 | 850 | 1.1180 | 45308520 |
0.0881 | 0.4675 | 855 | 1.1167 | 45581968 |
0.1758 | 0.4702 | 860 | 1.1189 | 45843704 |
0.0746 | 0.4730 | 865 | 1.1177 | 46113504 |
0.1186 | 0.4757 | 870 | 1.1182 | 46385088 |
0.0735 | 0.4784 | 875 | 1.1213 | 46646384 |
0.2116 | 0.4812 | 880 | 1.1207 | 46917848 |
0.1127 | 0.4839 | 885 | 1.1182 | 47182968 |
0.1042 | 0.4866 | 890 | 1.1196 | 47446872 |
0.1461 | 0.4894 | 895 | 1.1195 | 47705216 |
0.0872 | 0.4921 | 900 | 1.1166 | 47967064 |
0.1396 | 0.4948 | 905 | 1.1179 | 48227312 |
0.0739 | 0.4976 | 910 | 1.1200 | 48489816 |
0.1089 | 0.5003 | 915 | 1.1187 | 48761464 |
0.1124 | 0.5030 | 920 | 1.1156 | 49033200 |
0.1094 | 0.5058 | 925 | 1.1175 | 49292616 |
0.0829 | 0.5085 | 930 | 1.1193 | 49552456 |
0.1111 | 0.5112 | 935 | 1.1186 | 49826560 |
0.1273 | 0.5140 | 940 | 1.1157 | 50094336 |
0.0794 | 0.5167 | 945 | 1.1146 | 50354616 |
0.1021 | 0.5194 | 950 | 1.1163 | 50621536 |
0.0732 | 0.5222 | 955 | 1.1158 | 50887048 |
0.0998 | 0.5249 | 960 | 1.1159 | 51151256 |
0.1084 | 0.5276 | 965 | 1.1155 | 51412592 |
0.1076 | 0.5304 | 970 | 1.1158 | 51688216 |
0.1269 | 0.5331 | 975 | 1.1162 | 51953136 |
0.1242 | 0.5358 | 980 | 1.1123 | 52220944 |
0.1208 | 0.5386 | 985 | 1.1121 | 52491440 |
0.1058 | 0.5413 | 990 | 1.1161 | 52756560 |
0.1193 | 0.5440 | 995 | 1.1158 | 53021512 |
0.1141 | 0.5468 | 1000 | 1.1148 | 53285184 |
0.0977 | 0.5495 | 1005 | 1.1143 | 53551680 |
0.1648 | 0.5522 | 1010 | 1.1152 | 53810640 |
0.0852 | 0.5550 | 1015 | 1.1145 | 54082336 |
0.0862 | 0.5577 | 1020 | 1.1149 | 54344200 |
0.1324 | 0.5604 | 1025 | 1.1131 | 54614792 |
0.1159 | 0.5632 | 1030 | 1.1129 | 54878824 |
0.1012 | 0.5659 | 1035 | 1.1133 | 55141696 |
0.1036 | 0.5686 | 1040 | 1.1129 | 55408024 |
0.1005 | 0.5714 | 1045 | 1.1143 | 55673224 |
0.099 | 0.5741 | 1050 | 1.1144 | 55932544 |
0.0815 | 0.5768 | 1055 | 1.1137 | 56204976 |
0.1428 | 0.5796 | 1060 | 1.1144 | 56473248 |
0.1014 | 0.5823 | 1065 | 1.1131 | 56736352 |
0.1504 | 0.5850 | 1070 | 1.1119 | 57002088 |
0.1862 | 0.5878 | 1075 | 1.1121 | 57270184 |
0.1479 | 0.5905 | 1080 | 1.1120 | 57535360 |
0.0756 | 0.5932 | 1085 | 1.1126 | 57795104 |
0.0757 | 0.5960 | 1090 | 1.1120 | 58060920 |
0.0789 | 0.5987 | 1095 | 1.1123 | 58324808 |
0.1916 | 0.6014 | 1100 | 1.1130 | 58597928 |
0.071 | 0.6042 | 1105 | 1.1138 | 58865144 |
0.0851 | 0.6069 | 1110 | 1.1145 | 59128872 |
0.1405 | 0.6096 | 1115 | 1.1131 | 59387232 |
0.1295 | 0.6124 | 1120 | 1.1129 | 59651736 |
0.1431 | 0.6151 | 1125 | 1.1125 | 59915648 |
0.0855 | 0.6178 | 1130 | 1.1110 | 60179680 |
0.0987 | 0.6206 | 1135 | 1.1112 | 60444144 |
0.1198 | 0.6233 | 1140 | 1.1123 | 60721040 |
0.0887 | 0.6260 | 1145 | 1.1132 | 60988096 |
0.0983 | 0.6288 | 1150 | 1.1129 | 61258776 |
0.1265 | 0.6315 | 1155 | 1.1103 | 61527712 |
0.1088 | 0.6342 | 1160 | 1.1103 | 61794696 |
0.1348 | 0.6370 | 1165 | 1.1102 | 62054248 |
0.1188 | 0.6397 | 1170 | 1.1095 | 62319720 |
0.1353 | 0.6424 | 1175 | 1.1085 | 62583520 |
0.1576 | 0.6452 | 1180 | 1.1094 | 62846952 |
0.144 | 0.6479 | 1185 | 1.1093 | 63109192 |
0.1301 | 0.6507 | 1190 | 1.1102 | 63376000 |
0.1294 | 0.6534 | 1195 | 1.1087 | 63649096 |
0.0582 | 0.6561 | 1200 | 1.1069 | 63912744 |
0.1204 | 0.6589 | 1205 | 1.1078 | 64183184 |
0.1144 | 0.6616 | 1210 | 1.1104 | 64449936 |
0.1283 | 0.6643 | 1215 | 1.1106 | 64721056 |
0.1285 | 0.6671 | 1220 | 1.1077 | 64989128 |
0.1485 | 0.6698 | 1225 | 1.1069 | 65260248 |
0.1212 | 0.6725 | 1230 | 1.1095 | 65526968 |
0.1125 | 0.6753 | 1235 | 1.1114 | 65799712 |
0.1163 | 0.6780 | 1240 | 1.1083 | 66060384 |
0.1114 | 0.6807 | 1245 | 1.1078 | 66326384 |
0.1346 | 0.6835 | 1250 | 1.1074 | 66593160 |
0.1733 | 0.6862 | 1255 | 1.1087 | 66862232 |
0.1032 | 0.6889 | 1260 | 1.1088 | 67126144 |
0.1231 | 0.6917 | 1265 | 1.1073 | 67392880 |
0.1085 | 0.6944 | 1270 | 1.1074 | 67654488 |
0.0974 | 0.6971 | 1275 | 1.1098 | 67925984 |
0.143 | 0.6999 | 1280 | 1.1086 | 68188296 |
0.0838 | 0.7026 | 1285 | 1.1080 | 68448560 |
0.0976 | 0.7053 | 1290 | 1.1078 | 68708864 |
0.1292 | 0.7081 | 1295 | 1.1082 | 68978056 |
0.1044 | 0.7108 | 1300 | 1.1066 | 69243568 |
0.0977 | 0.7135 | 1305 | 1.1062 | 69515200 |
0.1286 | 0.7163 | 1310 | 1.1068 | 69778120 |
0.0776 | 0.7190 | 1315 | 1.1091 | 70048128 |
0.1029 | 0.7217 | 1320 | 1.1088 | 70318440 |
0.1214 | 0.7245 | 1325 | 1.1049 | 70585080 |
0.1093 | 0.7272 | 1330 | 1.1057 | 70848152 |
0.1311 | 0.7299 | 1335 | 1.1075 | 71114672 |
0.1728 | 0.7327 | 1340 | 1.1072 | 71381384 |
0.0732 | 0.7354 | 1345 | 1.1070 | 71643776 |
0.1181 | 0.7381 | 1350 | 1.1057 | 71913728 |
0.1417 | 0.7409 | 1355 | 1.1044 | 72172432 |
0.1179 | 0.7436 | 1360 | 1.1044 | 72443984 |
0.1062 | 0.7463 | 1365 | 1.1062 | 72706712 |
0.1212 | 0.7491 | 1370 | 1.1062 | 72966264 |
0.0653 | 0.7518 | 1375 | 1.1064 | 73229856 |
0.0924 | 0.7545 | 1380 | 1.1070 | 73494304 |
0.1094 | 0.7573 | 1385 | 1.1106 | 73749552 |
0.077 | 0.7600 | 1390 | 1.1096 | 74017720 |
0.0813 | 0.7627 | 1395 | 1.1070 | 74288168 |
0.0769 | 0.7655 | 1400 | 1.1065 | 74552176 |
0.1347 | 0.7682 | 1405 | 1.1064 | 74824184 |
0.1235 | 0.7709 | 1410 | 1.1061 | 75096184 |
0.0896 | 0.7737 | 1415 | 1.1064 | 75362984 |
0.0577 | 0.7764 | 1420 | 1.1071 | 75627968 |
0.1215 | 0.7791 | 1425 | 1.1068 | 75894216 |
0.1134 | 0.7819 | 1430 | 1.1058 | 76159136 |
0.1123 | 0.7846 | 1435 | 1.1058 | 76419272 |
0.1205 | 0.7873 | 1440 | 1.1040 | 76685336 |
0.1133 | 0.7901 | 1445 | 1.1027 | 76957096 |
0.1248 | 0.7928 | 1450 | 1.1040 | 77217888 |
0.0701 | 0.7955 | 1455 | 1.1051 | 77488320 |
0.1221 | 0.7983 | 1460 | 1.1049 | 77757352 |
0.0991 | 0.8010 | 1465 | 1.1030 | 78023408 |
0.089 | 0.8037 | 1470 | 1.1040 | 78288408 |
0.1273 | 0.8065 | 1475 | 1.1056 | 78558544 |
0.0915 | 0.8092 | 1480 | 1.1055 | 78824864 |
0.0628 | 0.8119 | 1485 | 1.1039 | 79099824 |
0.1077 | 0.8147 | 1490 | 1.1039 | 79365768 |
0.1935 | 0.8174 | 1495 | 1.1029 | 79628872 |
0.0713 | 0.8201 | 1500 | 1.1043 | 79900120 |
0.0789 | 0.8229 | 1505 | 1.1056 | 80163648 |
0.123 | 0.8256 | 1510 | 1.1051 | 80432696 |
0.1038 | 0.8283 | 1515 | 1.1049 | 80694800 |
0.1123 | 0.8311 | 1520 | 1.1051 | 80959272 |
0.132 | 0.8338 | 1525 | 1.1041 | 81223352 |
0.0983 | 0.8366 | 1530 | 1.1025 | 81498336 |
0.1214 | 0.8393 | 1535 | 1.1018 | 81766760 |
0.1245 | 0.8420 | 1540 | 1.1025 | 82042960 |
0.1174 | 0.8448 | 1545 | 1.1025 | 82314968 |
0.1163 | 0.8475 | 1550 | 1.1035 | 82580512 |
0.1451 | 0.8502 | 1555 | 1.1045 | 82846256 |
0.0875 | 0.8530 | 1560 | 1.1049 | 83108696 |
0.1221 | 0.8557 | 1565 | 1.1036 | 83372312 |
0.1077 | 0.8584 | 1570 | 1.1036 | 83638264 |
0.0804 | 0.8612 | 1575 | 1.1036 | 83894392 |
0.0911 | 0.8639 | 1580 | 1.1037 | 84161608 |
0.1219 | 0.8666 | 1585 | 1.1021 | 84424240 |
0.1646 | 0.8694 | 1590 | 1.1017 | 84688504 |
0.0952 | 0.8721 | 1595 | 1.1033 | 84950376 |
0.062 | 0.8748 | 1600 | 1.1045 | 85213256 |
0.067 | 0.8776 | 1605 | 1.1032 | 85481688 |
0.1024 | 0.8803 | 1610 | 1.1029 | 85751680 |
0.0792 | 0.8830 | 1615 | 1.1034 | 86021808 |
0.1521 | 0.8858 | 1620 | 1.1025 | 86289944 |
0.0585 | 0.8885 | 1625 | 1.1008 | 86549808 |
0.092 | 0.8912 | 1630 | 1.1009 | 86813904 |
0.0779 | 0.8940 | 1635 | 1.1019 | 87072840 |
0.1313 | 0.8967 | 1640 | 1.1027 | 87337552 |
0.1186 | 0.8994 | 1645 | 1.1029 | 87600144 |
0.1334 | 0.9022 | 1650 | 1.1008 | 87867616 |
0.1465 | 0.9049 | 1655 | 1.1005 | 88132784 |
0.0896 | 0.9076 | 1660 | 1.1007 | 88400488 |
0.0844 | 0.9104 | 1665 | 1.1014 | 88661864 |
0.1133 | 0.9131 | 1670 | 1.0998 | 88930432 |
0.1254 | 0.9158 | 1675 | 1.0997 | 89197224 |
0.0872 | 0.9186 | 1680 | 1.1013 | 89459416 |
0.0894 | 0.9213 | 1685 | 1.1026 | 89714824 |
0.0966 | 0.9240 | 1690 | 1.1038 | 89973008 |
0.1306 | 0.9268 | 1695 | 1.1028 | 90242312 |
0.0966 | 0.9295 | 1700 | 1.1016 | 90516664 |
0.109 | 0.9322 | 1705 | 1.1015 | 90782296 |
0.1459 | 0.9350 | 1710 | 1.1029 | 91046216 |
0.1263 | 0.9377 | 1715 | 1.1004 | 91314592 |
0.1455 | 0.9404 | 1720 | 1.0983 | 91581592 |
0.1634 | 0.9432 | 1725 | 1.0994 | 91845936 |
0.0472 | 0.9459 | 1730 | 1.1003 | 92114760 |
0.0649 | 0.9486 | 1735 | 1.1010 | 92377432 |
0.1087 | 0.9514 | 1740 | 1.1003 | 92639184 |
0.1317 | 0.9541 | 1745 | 1.1001 | 92903664 |
0.1633 | 0.9568 | 1750 | 1.1007 | 93179160 |
0.094 | 0.9596 | 1755 | 1.1005 | 93443368 |
0.0891 | 0.9623 | 1760 | 1.1010 | 93705928 |
0.1061 | 0.9650 | 1765 | 1.1010 | 93969520 |
0.1436 | 0.9678 | 1770 | 1.1004 | 94239728 |
0.0803 | 0.9705 | 1775 | 1.0998 | 94506160 |
0.0969 | 0.9732 | 1780 | 1.1000 | 94762856 |
0.0774 | 0.9760 | 1785 | 1.0999 | 95024512 |
0.1144 | 0.9787 | 1790 | 1.1006 | 95295264 |
0.1836 | 0.9814 | 1795 | 1.0992 | 95565808 |
0.0989 | 0.9842 | 1800 | 1.0981 | 95836072 |
0.0852 | 0.9869 | 1805 | 1.0993 | 96100928 |
0.0859 | 0.9896 | 1810 | 1.0999 | 96375008 |
0.1832 | 0.9924 | 1815 | 1.0995 | 96648680 |
0.0998 | 0.9951 | 1820 | 1.0992 | 96918672 |
0.0748 | 0.9978 | 1825 | 1.0992 | 97185832 |
Framework versions
- Transformers 4.44.0
- Pytorch 2.4.0+cu121
- Datasets 2.20.0
- Tokenizers 0.19.1
- Downloads last month
- 10
Model tree for RylanSchaeffer/collapse_gemma-2-2b_hs2_accumulate_iter19_sftsd1
Base model
google/gemma-2-2b