collapse_gemma-2-9b_hs2_accumulate_iter6_sftsd0
This model is a fine-tuned version of google/gemma-2-9b on an unknown dataset. It achieves the following results on the evaluation set:
- Loss: 0.9554
- Num Input Tokens Seen: 29176432
Model description
More information needed
Intended uses & limitations
More information needed
Training and evaluation data
More information needed
Training procedure
Training hyperparameters
The following hyperparameters were used during training:
- learning_rate: 8e-06
- train_batch_size: 4
- eval_batch_size: 16
- seed: 0
- gradient_accumulation_steps: 32
- total_train_batch_size: 128
- optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
- lr_scheduler_type: constant_with_warmup
- lr_scheduler_warmup_ratio: 0.05
- num_epochs: 1
Training results
Training Loss | Epoch | Step | Validation Loss | Input Tokens Seen |
---|---|---|---|---|
No log | 0 | 0 | 1.2335 | 0 |
1.5082 | 0.0086 | 5 | 1.2044 | 253900 |
1.2297 | 0.0173 | 10 | 1.0969 | 505088 |
1.0669 | 0.0259 | 15 | 1.0475 | 759448 |
0.9161 | 0.0346 | 20 | 1.0276 | 1013980 |
0.6699 | 0.0432 | 25 | 1.0402 | 1264204 |
0.5472 | 0.0518 | 30 | 1.0525 | 1517512 |
0.4024 | 0.0605 | 35 | 1.0378 | 1770520 |
0.3633 | 0.0691 | 40 | 1.0346 | 2012376 |
0.3315 | 0.0777 | 45 | 1.0235 | 2266452 |
0.3059 | 0.0864 | 50 | 1.0148 | 2521528 |
0.3055 | 0.0950 | 55 | 1.0137 | 2775812 |
0.2359 | 0.1037 | 60 | 1.0030 | 3027688 |
0.3126 | 0.1123 | 65 | 1.0022 | 3283636 |
0.2403 | 0.1209 | 70 | 0.9972 | 3543064 |
0.3407 | 0.1296 | 75 | 0.9931 | 3792188 |
0.2366 | 0.1382 | 80 | 0.9914 | 4039528 |
0.2589 | 0.1469 | 85 | 0.9923 | 4292552 |
0.2351 | 0.1555 | 90 | 0.9896 | 4546400 |
0.1684 | 0.1641 | 95 | 0.9880 | 4795116 |
0.172 | 0.1728 | 100 | 0.9846 | 5046368 |
0.19 | 0.1814 | 105 | 0.9834 | 5308588 |
0.2633 | 0.1901 | 110 | 0.9820 | 5559684 |
0.3733 | 0.1987 | 115 | 0.9821 | 5813992 |
0.2145 | 0.2073 | 120 | 0.9801 | 6070416 |
0.2155 | 0.2160 | 125 | 0.9830 | 6315596 |
0.2225 | 0.2246 | 130 | 0.9814 | 6569088 |
0.2722 | 0.2332 | 135 | 0.9773 | 6818112 |
0.2117 | 0.2419 | 140 | 0.9763 | 7063572 |
0.2697 | 0.2505 | 145 | 0.9750 | 7313212 |
0.1635 | 0.2592 | 150 | 0.9748 | 7563296 |
0.2014 | 0.2678 | 155 | 0.9749 | 7814756 |
0.2966 | 0.2764 | 160 | 0.9732 | 8071104 |
0.2152 | 0.2851 | 165 | 0.9732 | 8321916 |
0.2225 | 0.2937 | 170 | 0.9732 | 8575656 |
0.218 | 0.3024 | 175 | 0.9725 | 8829960 |
0.2213 | 0.3110 | 180 | 0.9709 | 9077472 |
0.2019 | 0.3196 | 185 | 0.9735 | 9327976 |
0.2356 | 0.3283 | 190 | 0.9727 | 9586088 |
0.282 | 0.3369 | 195 | 0.9703 | 9836880 |
0.1755 | 0.3456 | 200 | 0.9711 | 10084064 |
0.1982 | 0.3542 | 205 | 0.9711 | 10342304 |
0.2235 | 0.3628 | 210 | 0.9694 | 10594072 |
0.2343 | 0.3715 | 215 | 0.9692 | 10848940 |
0.2224 | 0.3801 | 220 | 0.9675 | 11105216 |
0.1573 | 0.3887 | 225 | 0.9683 | 11357564 |
0.232 | 0.3974 | 230 | 0.9683 | 11608688 |
0.2024 | 0.4060 | 235 | 0.9656 | 11861536 |
0.206 | 0.4147 | 240 | 0.9658 | 12109876 |
0.2774 | 0.4233 | 245 | 0.9673 | 12358492 |
0.2034 | 0.4319 | 250 | 0.9673 | 12613104 |
0.2507 | 0.4406 | 255 | 0.9648 | 12866204 |
0.2835 | 0.4492 | 260 | 0.9661 | 13119448 |
0.2383 | 0.4579 | 265 | 0.9680 | 13367496 |
0.2672 | 0.4665 | 270 | 0.9667 | 13620264 |
0.1784 | 0.4751 | 275 | 0.9643 | 13878680 |
0.1693 | 0.4838 | 280 | 0.9653 | 14127536 |
0.2884 | 0.4924 | 285 | 0.9677 | 14381756 |
0.2109 | 0.5011 | 290 | 0.9638 | 14643852 |
0.1975 | 0.5097 | 295 | 0.9641 | 14897344 |
0.2218 | 0.5183 | 300 | 0.9651 | 15142956 |
0.2154 | 0.5270 | 305 | 0.9652 | 15392580 |
0.1529 | 0.5356 | 310 | 0.9634 | 15649732 |
0.1644 | 0.5442 | 315 | 0.9660 | 15899204 |
0.2834 | 0.5529 | 320 | 0.9646 | 16150936 |
0.1629 | 0.5615 | 325 | 0.9613 | 16395960 |
0.1851 | 0.5702 | 330 | 0.9612 | 16655372 |
0.2276 | 0.5788 | 335 | 0.9634 | 16915404 |
0.2364 | 0.5874 | 340 | 0.9615 | 17171280 |
0.3287 | 0.5961 | 345 | 0.9599 | 17430220 |
0.2272 | 0.6047 | 350 | 0.9587 | 17676740 |
0.1756 | 0.6134 | 355 | 0.9613 | 17926836 |
0.2325 | 0.6220 | 360 | 0.9615 | 18180824 |
0.2313 | 0.6306 | 365 | 0.9595 | 18430524 |
0.1806 | 0.6393 | 370 | 0.9590 | 18684524 |
0.212 | 0.6479 | 375 | 0.9587 | 18939748 |
0.145 | 0.6566 | 380 | 0.9590 | 19193300 |
0.1975 | 0.6652 | 385 | 0.9595 | 19440700 |
0.2746 | 0.6738 | 390 | 0.9604 | 19694592 |
0.299 | 0.6825 | 395 | 0.9587 | 19945404 |
0.1257 | 0.6911 | 400 | 0.9578 | 20196008 |
0.2559 | 0.6997 | 405 | 0.9581 | 20442928 |
0.2001 | 0.7084 | 410 | 0.9594 | 20695556 |
0.2035 | 0.7170 | 415 | 0.9589 | 20943484 |
0.1544 | 0.7257 | 420 | 0.9574 | 21196736 |
0.2173 | 0.7343 | 425 | 0.9579 | 21449560 |
0.1656 | 0.7429 | 430 | 0.9585 | 21702020 |
0.2824 | 0.7516 | 435 | 0.9593 | 21952844 |
0.1876 | 0.7602 | 440 | 0.9601 | 22205932 |
0.2108 | 0.7689 | 445 | 0.9585 | 22454488 |
0.2672 | 0.7775 | 450 | 0.9576 | 22704452 |
0.1782 | 0.7861 | 455 | 0.9559 | 22955940 |
0.2339 | 0.7948 | 460 | 0.9549 | 23207052 |
0.2428 | 0.8034 | 465 | 0.9558 | 23456708 |
0.2038 | 0.8121 | 470 | 0.9555 | 23709712 |
0.2188 | 0.8207 | 475 | 0.9556 | 23963108 |
0.149 | 0.8293 | 480 | 0.9567 | 24215948 |
0.1509 | 0.8380 | 485 | 0.9577 | 24471656 |
0.1932 | 0.8466 | 490 | 0.9582 | 24719948 |
0.1685 | 0.8552 | 495 | 0.9556 | 24965208 |
0.1658 | 0.8639 | 500 | 0.9560 | 25218600 |
0.2438 | 0.8725 | 505 | 0.9582 | 25476704 |
0.2235 | 0.8812 | 510 | 0.9572 | 25724700 |
0.1904 | 0.8898 | 515 | 0.9544 | 25973760 |
0.2485 | 0.8984 | 520 | 0.9546 | 26231120 |
0.2104 | 0.9071 | 525 | 0.9548 | 26480832 |
0.1977 | 0.9157 | 530 | 0.9575 | 26738864 |
0.2057 | 0.9244 | 535 | 0.9570 | 26997660 |
0.1918 | 0.9330 | 540 | 0.9548 | 27253932 |
0.1763 | 0.9416 | 545 | 0.9556 | 27508012 |
0.1706 | 0.9503 | 550 | 0.9588 | 27758020 |
0.2287 | 0.9589 | 555 | 0.9556 | 28012216 |
0.213 | 0.9676 | 560 | 0.9543 | 28270144 |
0.1938 | 0.9762 | 565 | 0.9555 | 28520404 |
0.2117 | 0.9848 | 570 | 0.9572 | 28774464 |
0.2136 | 0.9935 | 575 | 0.9559 | 29028248 |
Framework versions
- Transformers 4.44.0
- Pytorch 2.4.0+cu121
- Datasets 2.20.0
- Tokenizers 0.19.1
- Downloads last month
- 0
Inference Providers
NEW
This model is not currently available via any of the supported third-party Inference Providers, and
HF Inference API was unable to determine this model's library.
Model tree for RylanSchaeffer/collapse_gemma-2-9b_hs2_accumulate_iter6_sftsd0
Base model
google/gemma-2-9b