RylanSchaeffer's picture
End of training
3d824e5 verified
metadata
license: gemma
base_model: google/gemma-2-27b
tags:
  - trl
  - sft
  - generated_from_trainer
model-index:
  - name: collapse_gemma-2-27b_hs2_replace_iter1_sftsd0
    results: []

collapse_gemma-2-27b_hs2_replace_iter1_sftsd0

This model is a fine-tuned version of google/gemma-2-27b on an unknown dataset. It achieves the following results on the evaluation set:

  • Loss: 0.9043
  • Num Input Tokens Seen: 5253020

Model description

More information needed

Intended uses & limitations

More information needed

Training and evaluation data

More information needed

Training procedure

Training hyperparameters

The following hyperparameters were used during training:

  • learning_rate: 8e-06
  • train_batch_size: 4
  • eval_batch_size: 16
  • seed: 0
  • gradient_accumulation_steps: 32
  • total_train_batch_size: 128
  • optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
  • lr_scheduler_type: constant_with_warmup
  • lr_scheduler_warmup_ratio: 0.05
  • num_epochs: 1

Training results

Training Loss Epoch Step Validation Loss Input Tokens Seen
No log 0 0 1.1282 0
1.0178 0.0511 5 0.9807 272284
0.9631 0.1021 10 0.9519 541544
0.933 0.1532 15 0.9407 815788
0.911 0.2043 20 0.9333 1087364
0.9322 0.2553 25 0.9283 1353816
0.9306 0.3064 30 0.9250 1626852
0.93 0.3575 35 0.9222 1893740
0.9036 0.4086 40 0.9192 2168380
0.9166 0.4596 45 0.9175 2438380
0.9158 0.5107 50 0.9154 2708844
0.9438 0.5618 55 0.9137 2978352
0.9321 0.6128 60 0.9119 3244148
0.9048 0.6639 65 0.9103 3518100
1.0015 0.7150 70 0.9100 3784544
0.8605 0.7660 75 0.9086 4055360
0.9524 0.8171 80 0.9077 4326216
0.9025 0.8682 85 0.9069 4595508
0.8468 0.9192 90 0.9062 4869076
0.8756 0.9703 95 0.9047 5142272

Framework versions

  • Transformers 4.44.0
  • Pytorch 2.4.0+cu121
  • Datasets 2.20.0
  • Tokenizers 0.19.1