|
--- |
|
license: gemma |
|
base_model: google/gemma-2-9b |
|
tags: |
|
- trl |
|
- sft |
|
- generated_from_trainer |
|
model-index: |
|
- name: collapse_gemma-2-9b_hs2_accumulate_iter4_sftsd0 |
|
results: [] |
|
--- |
|
|
|
<!-- This model card has been generated automatically according to the information the Trainer had access to. You |
|
should probably proofread and complete it, then remove this comment. --> |
|
|
|
# collapse_gemma-2-9b_hs2_accumulate_iter4_sftsd0 |
|
|
|
This model is a fine-tuned version of [google/gemma-2-9b](https://huggingface.co/google/gemma-2-9b) on an unknown dataset. |
|
It achieves the following results on the evaluation set: |
|
- Loss: 0.9492 |
|
- Num Input Tokens Seen: 19618104 |
|
|
|
## Model description |
|
|
|
More information needed |
|
|
|
## Intended uses & limitations |
|
|
|
More information needed |
|
|
|
## Training and evaluation data |
|
|
|
More information needed |
|
|
|
## Training procedure |
|
|
|
### Training hyperparameters |
|
|
|
The following hyperparameters were used during training: |
|
- learning_rate: 8e-06 |
|
- train_batch_size: 4 |
|
- eval_batch_size: 16 |
|
- seed: 0 |
|
- gradient_accumulation_steps: 32 |
|
- total_train_batch_size: 128 |
|
- optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08 |
|
- lr_scheduler_type: constant_with_warmup |
|
- lr_scheduler_warmup_ratio: 0.05 |
|
- num_epochs: 1 |
|
|
|
### Training results |
|
|
|
| Training Loss | Epoch | Step | Validation Loss | Input Tokens Seen | |
|
|:-------------:|:------:|:----:|:---------------:|:-----------------:| |
|
| No log | 0 | 0 | 1.2335 | 0 | |
|
| 1.2703 | 0.0130 | 5 | 1.1837 | 260728 | |
|
| 1.2545 | 0.0261 | 10 | 1.0745 | 511292 | |
|
| 0.9867 | 0.0391 | 15 | 1.0219 | 760004 | |
|
| 0.7077 | 0.0522 | 20 | 1.0147 | 1015876 | |
|
| 0.5384 | 0.0652 | 25 | 1.0220 | 1270300 | |
|
| 0.5591 | 0.0783 | 30 | 1.0193 | 1525228 | |
|
| 0.4475 | 0.0913 | 35 | 1.0166 | 1784804 | |
|
| 0.3602 | 0.1044 | 40 | 1.0124 | 2036584 | |
|
| 0.3623 | 0.1174 | 45 | 1.0037 | 2297140 | |
|
| 0.3845 | 0.1305 | 50 | 0.9974 | 2559416 | |
|
| 0.2587 | 0.1435 | 55 | 0.9923 | 2810020 | |
|
| 0.4471 | 0.1566 | 60 | 0.9912 | 3060436 | |
|
| 0.3047 | 0.1696 | 65 | 0.9868 | 3321640 | |
|
| 0.3731 | 0.1827 | 70 | 0.9832 | 3573720 | |
|
| 0.3265 | 0.1957 | 75 | 0.9839 | 3828028 | |
|
| 0.2885 | 0.2088 | 80 | 0.9812 | 4080608 | |
|
| 0.3128 | 0.2218 | 85 | 0.9791 | 4336288 | |
|
| 0.3204 | 0.2349 | 90 | 0.9770 | 4590108 | |
|
| 0.3495 | 0.2479 | 95 | 0.9758 | 4853076 | |
|
| 0.2884 | 0.2610 | 100 | 0.9760 | 5107028 | |
|
| 0.3117 | 0.2740 | 105 | 0.9728 | 5361252 | |
|
| 0.3231 | 0.2871 | 110 | 0.9732 | 5615724 | |
|
| 0.3288 | 0.3001 | 115 | 0.9715 | 5871856 | |
|
| 0.3798 | 0.3132 | 120 | 0.9698 | 6127844 | |
|
| 0.2902 | 0.3262 | 125 | 0.9698 | 6385324 | |
|
| 0.3605 | 0.3393 | 130 | 0.9706 | 6633264 | |
|
| 0.3544 | 0.3523 | 135 | 0.9679 | 6886668 | |
|
| 0.34 | 0.3654 | 140 | 0.9670 | 7149304 | |
|
| 0.3764 | 0.3784 | 145 | 0.9674 | 7405164 | |
|
| 0.2529 | 0.3915 | 150 | 0.9675 | 7653688 | |
|
| 0.2816 | 0.4045 | 155 | 0.9672 | 7913220 | |
|
| 0.2044 | 0.4176 | 160 | 0.9648 | 8167932 | |
|
| 0.2825 | 0.4306 | 165 | 0.9658 | 8418852 | |
|
| 0.2702 | 0.4436 | 170 | 0.9650 | 8677864 | |
|
| 0.3071 | 0.4567 | 175 | 0.9650 | 8935764 | |
|
| 0.3253 | 0.4697 | 180 | 0.9642 | 9187056 | |
|
| 0.2927 | 0.4828 | 185 | 0.9626 | 9442708 | |
|
| 0.2876 | 0.4958 | 190 | 0.9634 | 9701192 | |
|
| 0.3425 | 0.5089 | 195 | 0.9624 | 9955308 | |
|
| 0.3433 | 0.5219 | 200 | 0.9602 | 10214732 | |
|
| 0.3315 | 0.5350 | 205 | 0.9611 | 10466412 | |
|
| 0.2934 | 0.5480 | 210 | 0.9605 | 10714628 | |
|
| 0.2463 | 0.5611 | 215 | 0.9612 | 10976808 | |
|
| 0.3642 | 0.5741 | 220 | 0.9613 | 11234876 | |
|
| 0.3245 | 0.5872 | 225 | 0.9589 | 11495408 | |
|
| 0.2885 | 0.6002 | 230 | 0.9589 | 11752512 | |
|
| 0.3555 | 0.6133 | 235 | 0.9600 | 12002952 | |
|
| 0.2814 | 0.6263 | 240 | 0.9583 | 12260908 | |
|
| 0.3228 | 0.6394 | 245 | 0.9574 | 12519812 | |
|
| 0.3228 | 0.6524 | 250 | 0.9576 | 12782436 | |
|
| 0.3823 | 0.6655 | 255 | 0.9572 | 13042344 | |
|
| 0.3539 | 0.6785 | 260 | 0.9562 | 13307776 | |
|
| 0.3418 | 0.6916 | 265 | 0.9571 | 13567712 | |
|
| 0.2592 | 0.7046 | 270 | 0.9593 | 13823848 | |
|
| 0.2523 | 0.7177 | 275 | 0.9564 | 14073252 | |
|
| 0.2883 | 0.7307 | 280 | 0.9557 | 14325632 | |
|
| 0.2877 | 0.7438 | 285 | 0.9546 | 14580592 | |
|
| 0.3691 | 0.7568 | 290 | 0.9545 | 14834352 | |
|
| 0.2924 | 0.7699 | 295 | 0.9546 | 15098672 | |
|
| 0.3078 | 0.7829 | 300 | 0.9533 | 15350204 | |
|
| 0.3201 | 0.7960 | 305 | 0.9544 | 15609792 | |
|
| 0.3147 | 0.8090 | 310 | 0.9544 | 15869296 | |
|
| 0.3097 | 0.8221 | 315 | 0.9523 | 16121416 | |
|
| 0.2708 | 0.8351 | 320 | 0.9522 | 16378908 | |
|
| 0.2285 | 0.8481 | 325 | 0.9549 | 16637160 | |
|
| 0.2825 | 0.8612 | 330 | 0.9535 | 16895604 | |
|
| 0.3189 | 0.8742 | 335 | 0.9523 | 17153840 | |
|
| 0.263 | 0.8873 | 340 | 0.9529 | 17408728 | |
|
| 0.247 | 0.9003 | 345 | 0.9521 | 17664248 | |
|
| 0.2309 | 0.9134 | 350 | 0.9532 | 17925640 | |
|
| 0.2487 | 0.9264 | 355 | 0.9513 | 18183340 | |
|
| 0.3177 | 0.9395 | 360 | 0.9518 | 18443996 | |
|
| 0.2997 | 0.9525 | 365 | 0.9521 | 18692904 | |
|
| 0.3384 | 0.9656 | 370 | 0.9516 | 18947432 | |
|
| 0.2958 | 0.9786 | 375 | 0.9513 | 19210912 | |
|
| 0.3001 | 0.9917 | 380 | 0.9484 | 19465112 | |
|
|
|
|
|
### Framework versions |
|
|
|
- Transformers 4.44.0 |
|
- Pytorch 2.4.0+cu121 |
|
- Datasets 2.20.0 |
|
- Tokenizers 0.19.1 |
|
|