RylanSchaeffer's picture
End of training
24b021e verified
metadata
license: gemma
base_model: google/gemma-2-2b
tags:
  - trl
  - sft
  - generated_from_trainer
model-index:
  - name: collapse_gemma-2-2b_hs2_replace_iter12_sftsd0
    results: []

collapse_gemma-2-2b_hs2_replace_iter12_sftsd0

This model is a fine-tuned version of google/gemma-2-2b on an unknown dataset. It achieves the following results on the evaluation set:

  • Loss: 2.6294
  • Num Input Tokens Seen: 4751776

Model description

More information needed

Intended uses & limitations

More information needed

Training and evaluation data

More information needed

Training procedure

Training hyperparameters

The following hyperparameters were used during training:

  • learning_rate: 8e-06
  • train_batch_size: 8
  • eval_batch_size: 16
  • seed: 0
  • gradient_accumulation_steps: 16
  • total_train_batch_size: 128
  • optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
  • lr_scheduler_type: constant_with_warmup
  • lr_scheduler_warmup_ratio: 0.05
  • num_epochs: 1

Training results

Training Loss Epoch Step Validation Loss Input Tokens Seen
No log 0 0 1.3909 0
1.4105 0.0511 5 1.2861 260792
0.7212 0.1021 10 1.3450 513752
0.4618 0.1532 15 1.5903 758800
0.269 0.2042 20 1.8248 1000344
0.1064 0.2553 25 2.0278 1244312
0.0632 0.3063 30 2.2668 1480632
0.0349 0.3574 35 2.3784 1715600
0.0345 0.4084 40 2.4896 1961024
0.0309 0.4595 45 2.5375 2202712
0.0238 0.5105 50 2.5525 2440616
0.0229 0.5616 55 2.6030 2686984
0.0246 0.6126 60 2.6495 2928608
0.0206 0.6637 65 2.6637 3184600
0.0224 0.7147 70 2.6544 3423976
0.0233 0.7658 75 2.6538 3672776
0.0249 0.8168 80 2.6576 3911528
0.0214 0.8679 85 2.6595 4164584
0.0232 0.9190 90 2.6339 4412224
0.0222 0.9700 95 2.6302 4650264

Framework versions

  • Transformers 4.44.0
  • Pytorch 2.4.0+cu121
  • Datasets 2.20.0
  • Tokenizers 0.19.1