RylanSchaeffer's picture
End of training
9692e7b verified
metadata
license: gemma
base_model: google/gemma-2-2b
tags:
  - trl
  - sft
  - generated_from_trainer
model-index:
  - name: collapse_gemma-2-2b_hs2_replace_iter13_sftsd2
    results: []

collapse_gemma-2-2b_hs2_replace_iter13_sftsd2

This model is a fine-tuned version of google/gemma-2-2b on an unknown dataset. It achieves the following results on the evaluation set:

  • Loss: 2.5914
  • Num Input Tokens Seen: 4764832

Model description

More information needed

Intended uses & limitations

More information needed

Training and evaluation data

More information needed

Training procedure

Training hyperparameters

The following hyperparameters were used during training:

  • learning_rate: 8e-06
  • train_batch_size: 8
  • eval_batch_size: 16
  • seed: 2
  • gradient_accumulation_steps: 16
  • total_train_batch_size: 128
  • optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
  • lr_scheduler_type: constant_with_warmup
  • lr_scheduler_warmup_ratio: 0.05
  • num_epochs: 1

Training results

Training Loss Epoch Step Validation Loss Input Tokens Seen
No log 0 0 1.3909 0
1.6665 0.0511 5 1.2782 248088
0.7132 0.1021 10 1.3229 492944
0.4332 0.1532 15 1.5331 737200
0.2622 0.2042 20 1.7169 982128
0.136 0.2553 25 1.9671 1228256
0.077 0.3063 30 2.1798 1475016
0.0375 0.3574 35 2.3843 1718752
0.0255 0.4084 40 2.5202 1966432
0.0209 0.4595 45 2.5795 2208784
0.0194 0.5105 50 2.5974 2459208
0.0199 0.5616 55 2.6020 2700064
0.0211 0.6126 60 2.6116 2947288
0.0206 0.6637 65 2.6139 3192944
0.02 0.7147 70 2.6100 3432568
0.0204 0.7658 75 2.5829 3677032
0.0213 0.8168 80 2.5711 3922712
0.0209 0.8679 85 2.5732 4172608
0.0192 0.9190 90 2.5755 4418512
0.0197 0.9700 95 2.5900 4665416

Framework versions

  • Transformers 4.44.0
  • Pytorch 2.4.0+cu121
  • Datasets 2.20.0
  • Tokenizers 0.19.1