collapse_gemma-2-9b_hs2_accumulate_iter6_sftsd0

This model is a fine-tuned version of google/gemma-2-9b on an unknown dataset. It achieves the following results on the evaluation set:

  • Loss: 0.9554
  • Num Input Tokens Seen: 29176432

Model description

More information needed

Intended uses & limitations

More information needed

Training and evaluation data

More information needed

Training procedure

Training hyperparameters

The following hyperparameters were used during training:

  • learning_rate: 8e-06
  • train_batch_size: 4
  • eval_batch_size: 16
  • seed: 0
  • gradient_accumulation_steps: 32
  • total_train_batch_size: 128
  • optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
  • lr_scheduler_type: constant_with_warmup
  • lr_scheduler_warmup_ratio: 0.05
  • num_epochs: 1

Training results

Training Loss Epoch Step Validation Loss Input Tokens Seen
No log 0 0 1.2335 0
1.5082 0.0086 5 1.2044 253900
1.2297 0.0173 10 1.0969 505088
1.0669 0.0259 15 1.0475 759448
0.9161 0.0346 20 1.0276 1013980
0.6699 0.0432 25 1.0402 1264204
0.5472 0.0518 30 1.0525 1517512
0.4024 0.0605 35 1.0378 1770520
0.3633 0.0691 40 1.0346 2012376
0.3315 0.0777 45 1.0235 2266452
0.3059 0.0864 50 1.0148 2521528
0.3055 0.0950 55 1.0137 2775812
0.2359 0.1037 60 1.0030 3027688
0.3126 0.1123 65 1.0022 3283636
0.2403 0.1209 70 0.9972 3543064
0.3407 0.1296 75 0.9931 3792188
0.2366 0.1382 80 0.9914 4039528
0.2589 0.1469 85 0.9923 4292552
0.2351 0.1555 90 0.9896 4546400
0.1684 0.1641 95 0.9880 4795116
0.172 0.1728 100 0.9846 5046368
0.19 0.1814 105 0.9834 5308588
0.2633 0.1901 110 0.9820 5559684
0.3733 0.1987 115 0.9821 5813992
0.2145 0.2073 120 0.9801 6070416
0.2155 0.2160 125 0.9830 6315596
0.2225 0.2246 130 0.9814 6569088
0.2722 0.2332 135 0.9773 6818112
0.2117 0.2419 140 0.9763 7063572
0.2697 0.2505 145 0.9750 7313212
0.1635 0.2592 150 0.9748 7563296
0.2014 0.2678 155 0.9749 7814756
0.2966 0.2764 160 0.9732 8071104
0.2152 0.2851 165 0.9732 8321916
0.2225 0.2937 170 0.9732 8575656
0.218 0.3024 175 0.9725 8829960
0.2213 0.3110 180 0.9709 9077472
0.2019 0.3196 185 0.9735 9327976
0.2356 0.3283 190 0.9727 9586088
0.282 0.3369 195 0.9703 9836880
0.1755 0.3456 200 0.9711 10084064
0.1982 0.3542 205 0.9711 10342304
0.2235 0.3628 210 0.9694 10594072
0.2343 0.3715 215 0.9692 10848940
0.2224 0.3801 220 0.9675 11105216
0.1573 0.3887 225 0.9683 11357564
0.232 0.3974 230 0.9683 11608688
0.2024 0.4060 235 0.9656 11861536
0.206 0.4147 240 0.9658 12109876
0.2774 0.4233 245 0.9673 12358492
0.2034 0.4319 250 0.9673 12613104
0.2507 0.4406 255 0.9648 12866204
0.2835 0.4492 260 0.9661 13119448
0.2383 0.4579 265 0.9680 13367496
0.2672 0.4665 270 0.9667 13620264
0.1784 0.4751 275 0.9643 13878680
0.1693 0.4838 280 0.9653 14127536
0.2884 0.4924 285 0.9677 14381756
0.2109 0.5011 290 0.9638 14643852
0.1975 0.5097 295 0.9641 14897344
0.2218 0.5183 300 0.9651 15142956
0.2154 0.5270 305 0.9652 15392580
0.1529 0.5356 310 0.9634 15649732
0.1644 0.5442 315 0.9660 15899204
0.2834 0.5529 320 0.9646 16150936
0.1629 0.5615 325 0.9613 16395960
0.1851 0.5702 330 0.9612 16655372
0.2276 0.5788 335 0.9634 16915404
0.2364 0.5874 340 0.9615 17171280
0.3287 0.5961 345 0.9599 17430220
0.2272 0.6047 350 0.9587 17676740
0.1756 0.6134 355 0.9613 17926836
0.2325 0.6220 360 0.9615 18180824
0.2313 0.6306 365 0.9595 18430524
0.1806 0.6393 370 0.9590 18684524
0.212 0.6479 375 0.9587 18939748
0.145 0.6566 380 0.9590 19193300
0.1975 0.6652 385 0.9595 19440700
0.2746 0.6738 390 0.9604 19694592
0.299 0.6825 395 0.9587 19945404
0.1257 0.6911 400 0.9578 20196008
0.2559 0.6997 405 0.9581 20442928
0.2001 0.7084 410 0.9594 20695556
0.2035 0.7170 415 0.9589 20943484
0.1544 0.7257 420 0.9574 21196736
0.2173 0.7343 425 0.9579 21449560
0.1656 0.7429 430 0.9585 21702020
0.2824 0.7516 435 0.9593 21952844
0.1876 0.7602 440 0.9601 22205932
0.2108 0.7689 445 0.9585 22454488
0.2672 0.7775 450 0.9576 22704452
0.1782 0.7861 455 0.9559 22955940
0.2339 0.7948 460 0.9549 23207052
0.2428 0.8034 465 0.9558 23456708
0.2038 0.8121 470 0.9555 23709712
0.2188 0.8207 475 0.9556 23963108
0.149 0.8293 480 0.9567 24215948
0.1509 0.8380 485 0.9577 24471656
0.1932 0.8466 490 0.9582 24719948
0.1685 0.8552 495 0.9556 24965208
0.1658 0.8639 500 0.9560 25218600
0.2438 0.8725 505 0.9582 25476704
0.2235 0.8812 510 0.9572 25724700
0.1904 0.8898 515 0.9544 25973760
0.2485 0.8984 520 0.9546 26231120
0.2104 0.9071 525 0.9548 26480832
0.1977 0.9157 530 0.9575 26738864
0.2057 0.9244 535 0.9570 26997660
0.1918 0.9330 540 0.9548 27253932
0.1763 0.9416 545 0.9556 27508012
0.1706 0.9503 550 0.9588 27758020
0.2287 0.9589 555 0.9556 28012216
0.213 0.9676 560 0.9543 28270144
0.1938 0.9762 565 0.9555 28520404
0.2117 0.9848 570 0.9572 28774464
0.2136 0.9935 575 0.9559 29028248

Framework versions

  • Transformers 4.44.0
  • Pytorch 2.4.0+cu121
  • Datasets 2.20.0
  • Tokenizers 0.19.1
Downloads last month
0
Safetensors
Model size
9.24B params
Tensor type
BF16
·
Inference Providers NEW
This model is not currently available via any of the supported third-party Inference Providers, and HF Inference API was unable to determine this model's library.

Model tree for RylanSchaeffer/collapse_gemma-2-9b_hs2_accumulate_iter6_sftsd0

Base model

google/gemma-2-9b
Finetuned
(227)
this model