---
license: gemma
base_model: google/gemma-2-27b
tags:
- trl
- sft
- generated_from_trainer
model-index:
- name: collapse_gemma-2-27b_hs2_accumulate_iter3_sftsd2
  results: []
---

<!-- This model card has been generated automatically according to the information the Trainer had access to. You
should probably proofread and complete it, then remove this comment. -->

# collapse_gemma-2-27b_hs2_accumulate_iter3_sftsd2

This model is a fine-tuned version of [google/gemma-2-27b](https://huggingface.co/google/gemma-2-27b) on an unknown dataset.
It achieves the following results on the evaluation set:
- Loss: 0.9331
- Num Input Tokens Seen: 13190464

## Model description

More information needed

## Intended uses & limitations

More information needed

## Training and evaluation data

More information needed

## Training procedure

### Training hyperparameters

The following hyperparameters were used during training:
- learning_rate: 8e-06
- train_batch_size: 4
- eval_batch_size: 16
- seed: 2
- gradient_accumulation_steps: 32
- total_train_batch_size: 128
- optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
- lr_scheduler_type: constant_with_warmup
- lr_scheduler_warmup_ratio: 0.05
- num_epochs: 1

### Training results

| Training Loss | Epoch  | Step | Validation Loss | Input Tokens Seen |
|:-------------:|:------:|:----:|:---------------:|:-----------------:|
| No log        | 0      | 0    | 1.1282          | 0                 |
| 2.3244        | 0.0184 | 5    | 1.0518          | 240912            |
| 2.2442        | 0.0368 | 10   | 0.9933          | 480908            |
| 2.1347        | 0.0551 | 15   | 0.9797          | 713948            |
| 2.0779        | 0.0735 | 20   | 0.9788          | 953808            |
| 1.6988        | 0.0919 | 25   | 0.9776          | 1202776           |
| 1.6197        | 0.1103 | 30   | 0.9794          | 1447736           |
| 1.5939        | 0.1286 | 35   | 0.9787          | 1694460           |
| 1.391         | 0.1470 | 40   | 0.9787          | 1934204           |
| 1.1954        | 0.1654 | 45   | 0.9771          | 2171112           |
| 1.1232        | 0.1838 | 50   | 0.9747          | 2409548           |
| 1.1961        | 0.2022 | 55   | 0.9722          | 2648484           |
| 0.9664        | 0.2205 | 60   | 0.9710          | 2887652           |
| 1.1064        | 0.2389 | 65   | 0.9667          | 3127516           |
| 1.0085        | 0.2573 | 70   | 0.9611          | 3368304           |
| 0.8056        | 0.2757 | 75   | 0.9606          | 3603000           |
| 0.9106        | 0.2941 | 80   | 0.9576          | 3850976           |
| 0.9384        | 0.3124 | 85   | 0.9544          | 4094752           |
| 0.8953        | 0.3308 | 90   | 0.9521          | 4345860           |
| 0.8928        | 0.3492 | 95   | 0.9511          | 4588756           |
| 0.7887        | 0.3676 | 100  | 0.9490          | 4837704           |
| 0.9092        | 0.3859 | 105  | 0.9497          | 5078112           |
| 0.7458        | 0.4043 | 110  | 0.9471          | 5318968           |
| 0.762         | 0.4227 | 115  | 0.9463          | 5556324           |
| 0.8916        | 0.4411 | 120  | 0.9436          | 5803288           |
| 0.791         | 0.4595 | 125  | 0.9442          | 6042868           |
| 0.9366        | 0.4778 | 130  | 0.9417          | 6282932           |
| 0.8494        | 0.4962 | 135  | 0.9418          | 6522180           |
| 1.0078        | 0.5146 | 140  | 0.9399          | 6773624           |
| 0.9159        | 0.5330 | 145  | 0.9380          | 7011976           |
| 1.0115        | 0.5513 | 150  | 0.9390          | 7257008           |
| 0.84          | 0.5697 | 155  | 0.9380          | 7501580           |
| 0.8987        | 0.5881 | 160  | 0.9393          | 7742124           |
| 0.9589        | 0.6065 | 165  | 0.9370          | 7981768           |
| 0.8201        | 0.6249 | 170  | 0.9371          | 8222304           |
| 0.7601        | 0.6432 | 175  | 0.9348          | 8469856           |
| 0.7465        | 0.6616 | 180  | 0.9378          | 8710912           |
| 0.8689        | 0.6800 | 185  | 0.9381          | 8949132           |
| 0.6945        | 0.6984 | 190  | 0.9343          | 9196744           |
| 0.7289        | 0.7167 | 195  | 0.9358          | 9434412           |
| 0.583         | 0.7351 | 200  | 0.9336          | 9677156           |
| 0.6272        | 0.7535 | 205  | 0.9356          | 9916792           |
| 0.7919        | 0.7719 | 210  | 0.9353          | 10162084          |
| 0.9377        | 0.7903 | 215  | 0.9334          | 10403240          |
| 0.7397        | 0.8086 | 220  | 0.9330          | 10650280          |
| 0.6871        | 0.8270 | 225  | 0.9342          | 10885396          |
| 0.9175        | 0.8454 | 230  | 0.9339          | 11138056          |
| 0.621         | 0.8638 | 235  | 0.9336          | 11382612          |
| 0.8007        | 0.8822 | 240  | 0.9324          | 11620516          |
| 0.691         | 0.9005 | 245  | 0.9353          | 11865444          |
| 0.7516        | 0.9189 | 250  | 0.9329          | 12109276          |
| 0.9474        | 0.9373 | 255  | 0.9326          | 12346224          |
| 0.7389        | 0.9557 | 260  | 0.9335          | 12594020          |
| 0.7986        | 0.9740 | 265  | 0.9310          | 12844164          |
| 0.9011        | 0.9924 | 270  | 0.9335          | 13090264          |


### Framework versions

- Transformers 4.44.0
- Pytorch 2.4.0+cu121
- Datasets 2.20.0
- Tokenizers 0.19.1