collapse_gemma-2-2b_hs2_replace_iter12_sftsd0

This model is a fine-tuned version of google/gemma-2-2b on an unknown dataset. It achieves the following results on the evaluation set:

  • Loss: 2.6294
  • Num Input Tokens Seen: 4751776

Model description

More information needed

Intended uses & limitations

More information needed

Training and evaluation data

More information needed

Training procedure

Training hyperparameters

The following hyperparameters were used during training:

  • learning_rate: 8e-06
  • train_batch_size: 8
  • eval_batch_size: 16
  • seed: 0
  • gradient_accumulation_steps: 16
  • total_train_batch_size: 128
  • optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
  • lr_scheduler_type: constant_with_warmup
  • lr_scheduler_warmup_ratio: 0.05
  • num_epochs: 1

Training results

Training Loss Epoch Step Validation Loss Input Tokens Seen
No log 0 0 1.3909 0
1.4105 0.0511 5 1.2861 260792
0.7212 0.1021 10 1.3450 513752
0.4618 0.1532 15 1.5903 758800
0.269 0.2042 20 1.8248 1000344
0.1064 0.2553 25 2.0278 1244312
0.0632 0.3063 30 2.2668 1480632
0.0349 0.3574 35 2.3784 1715600
0.0345 0.4084 40 2.4896 1961024
0.0309 0.4595 45 2.5375 2202712
0.0238 0.5105 50 2.5525 2440616
0.0229 0.5616 55 2.6030 2686984
0.0246 0.6126 60 2.6495 2928608
0.0206 0.6637 65 2.6637 3184600
0.0224 0.7147 70 2.6544 3423976
0.0233 0.7658 75 2.6538 3672776
0.0249 0.8168 80 2.6576 3911528
0.0214 0.8679 85 2.6595 4164584
0.0232 0.9190 90 2.6339 4412224
0.0222 0.9700 95 2.6302 4650264

Framework versions

  • Transformers 4.44.0
  • Pytorch 2.4.0+cu121
  • Datasets 2.20.0
  • Tokenizers 0.19.1
Downloads last month
49
Safetensors
Model size
2.61B params
Tensor type
BF16
ยท
Inference Providers NEW
This model isn't deployed by any Inference Provider. ๐Ÿ™‹ Ask for provider support

Model tree for RylanSchaeffer/collapse_gemma-2-2b_hs2_replace_iter12_sftsd0

Base model

google/gemma-2-2b
Finetuned
(528)
this model