RylanSchaeffer's picture
End of training
68a9375 verified
|
raw
history blame
2.92 kB
metadata
license: gemma
base_model: google/gemma-2-9b
tags:
  - trl
  - sft
  - generated_from_trainer
model-index:
  - name: collapse_gemma-2-9b_hs2_replace_iter5_sftsd1
    results: []

collapse_gemma-2-9b_hs2_replace_iter5_sftsd1

This model is a fine-tuned version of google/gemma-2-9b on an unknown dataset. It achieves the following results on the evaluation set:

  • Loss: 1.6182
  • Num Input Tokens Seen: 4642096

Model description

More information needed

Intended uses & limitations

More information needed

Training and evaluation data

More information needed

Training procedure

Training hyperparameters

The following hyperparameters were used during training:

  • learning_rate: 8e-06
  • train_batch_size: 4
  • eval_batch_size: 16
  • seed: 1
  • gradient_accumulation_steps: 32
  • total_train_batch_size: 128
  • optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
  • lr_scheduler_type: constant_with_warmup
  • lr_scheduler_warmup_ratio: 0.05
  • num_epochs: 1

Training results

Training Loss Epoch Step Validation Loss Input Tokens Seen
No log 0 0 1.2335 0
1.1356 0.0513 5 1.0985 236640
0.5158 0.1027 10 1.1915 478096
0.1867 0.1540 15 1.3803 713224
0.0692 0.2053 20 1.4815 950388
0.0275 0.2567 25 1.3822 1195036
0.0254 0.3080 30 1.4820 1434668
0.0248 0.3593 35 1.5498 1669304
0.0226 0.4107 40 1.5781 1908432
0.0247 0.4620 45 1.5221 2150908
0.0224 0.5133 50 1.4731 2397208
0.0289 0.5646 55 1.4650 2634352
0.0217 0.6160 60 1.4817 2867608
0.0255 0.6673 65 1.5039 3114572
0.0207 0.7186 70 1.5013 3357172
0.0214 0.7700 75 1.4934 3593844
0.0231 0.8213 80 1.5160 3833908
0.0205 0.8726 85 1.5363 4071676
0.0219 0.9240 90 1.5761 4314868
0.0244 0.9753 95 1.6040 4546468

Framework versions

  • Transformers 4.44.0
  • Pytorch 2.4.0+cu121
  • Datasets 2.20.0
  • Tokenizers 0.19.1