metadata

license: gemma
base_model: google/gemma-2-27b
tags:
  - trl
  - sft
  - generated_from_trainer
model-index:
  - name: collapse_gemma-2-27b_hs2_replace_iter1_sftsd0
    results: []

collapse_gemma-2-27b_hs2_replace_iter1_sftsd0

This model is a fine-tuned version of google/gemma-2-27b on an unknown dataset. It achieves the following results on the evaluation set:

Loss: 0.9043
Num Input Tokens Seen: 5253020

Model description

More information needed

Intended uses & limitations

More information needed

Training and evaluation data

More information needed

Training procedure

Training hyperparameters

The following hyperparameters were used during training:

learning_rate: 8e-06
train_batch_size: 4
eval_batch_size: 16
seed: 0
gradient_accumulation_steps: 32
total_train_batch_size: 128
optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
lr_scheduler_type: constant_with_warmup
lr_scheduler_warmup_ratio: 0.05
num_epochs: 1

Training results

Training Loss	Epoch	Step	Validation Loss	Input Tokens Seen
No log	0	0	1.1282	0
1.0178	0.0511	5	0.9807	272284
0.9631	0.1021	10	0.9519	541544
0.933	0.1532	15	0.9407	815788
0.911	0.2043	20	0.9333	1087364
0.9322	0.2553	25	0.9283	1353816
0.9306	0.3064	30	0.9250	1626852
0.93	0.3575	35	0.9222	1893740
0.9036	0.4086	40	0.9192	2168380
0.9166	0.4596	45	0.9175	2438380
0.9158	0.5107	50	0.9154	2708844
0.9438	0.5618	55	0.9137	2978352
0.9321	0.6128	60	0.9119	3244148
0.9048	0.6639	65	0.9103	3518100
1.0015	0.7150	70	0.9100	3784544
0.8605	0.7660	75	0.9086	4055360
0.9524	0.8171	80	0.9077	4326216
0.9025	0.8682	85	0.9069	4595508
0.8468	0.9192	90	0.9062	4869076
0.8756	0.9703	95	0.9047	5142272

Framework versions

Transformers 4.44.0
Pytorch 2.4.0+cu121
Datasets 2.20.0
Tokenizers 0.19.1