Model save

Browse files

Files changed (6) hide show

README.md +19 -25
adapter_model.safetensors +1 -1
all_results.json +6 -11
runs/Jun10_07-38-05_48ddfe8e991f/events.out.tfevents.1718005108.48ddfe8e991f.245392.0 +2 -2
train_results.json +6 -6
trainer_state.json +0 -0

README.md CHANGED Viewed

@@ -2,13 +2,12 @@
 license: gemma
 library_name: peft
 tags:
-- alignment-handbook
 - trl
 - sft
 - generated_from_trainer
 base_model: google/gemma-2b
 datasets:
-- llama-duo/synth_summarize_dataset_dedup
 model-index:
 - name: gemma2b-summarize-gemini1_5flash-128k
   results: []
@@ -19,9 +18,9 @@ should probably proofread and complete it, then remove this comment. -->
 # gemma2b-summarize-gemini1_5flash-128k
-This model is a fine-tuned version of [google/gemma-2b](https://huggingface.co/google/gemma-2b) on the llama-duo/synth_summarize_dataset_dedup dataset.
 It achieves the following results on the evaluation set:
-- Loss: 2.5119
 ## Model description
@@ -45,40 +44,35 @@ The following hyperparameters were used during training:
 - eval_batch_size: 8
 - seed: 42
 - distributed_type: multi-GPU
-- num_devices: 4
 - gradient_accumulation_steps: 2
-- total_train_batch_size: 64
-- total_eval_batch_size: 32
 - optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
 - lr_scheduler_type: cosine
 - lr_scheduler_warmup_ratio: 0.1
-- num_epochs: 15
 ### Training results
 | Training Loss | Epoch | Step | Validation Loss |
 |:-------------:|:-----:|:----:|:---------------:|
-| 1.1289        | 1.0   | 208  | 2.5162          |
-| 1.0298        | 2.0   | 416  | 2.4574          |
-| 0.9905        | 3.0   | 624  | 2.4455          |
-| 0.9668        | 4.0   | 832  | 2.4518          |
-| 0.9507        | 5.0   | 1040 | 2.4578          |
-| 0.9348        | 6.0   | 1248 | 2.4685          |
-| 0.9236        | 7.0   | 1456 | 2.4789          |
-| 0.9156        | 8.0   | 1664 | 2.4831          |
-| 0.8987        | 9.0   | 1872 | 2.4963          |
-| 0.9008        | 10.0  | 2080 | 2.5021          |
-| 0.8976        | 11.0  | 2288 | 2.5050          |
-| 0.8941        | 12.0  | 2496 | 2.5107          |
-| 0.8878        | 13.0  | 2704 | 2.5123          |
-| 0.8896        | 14.0  | 2912 | 2.5120          |
-| 0.8797        | 15.0  | 3120 | 2.5119          |
 ### Framework versions
 - PEFT 0.11.1
-- Transformers 4.40.1
-- Pytorch 2.2.0+cu121
 - Datasets 2.19.2
 - Tokenizers 0.19.1

 license: gemma
 library_name: peft
 tags:
 - trl
 - sft
 - generated_from_trainer
 base_model: google/gemma-2b
 datasets:
+- generator
 model-index:
 - name: gemma2b-summarize-gemini1_5flash-128k
   results: []
 # gemma2b-summarize-gemini1_5flash-128k
+This model is a fine-tuned version of [google/gemma-2b](https://huggingface.co/google/gemma-2b) on the generator dataset.
 It achieves the following results on the evaluation set:
+- Loss: 2.5573
 ## Model description
 - eval_batch_size: 8
 - seed: 42
 - distributed_type: multi-GPU
+- num_devices: 8
 - gradient_accumulation_steps: 2
+- total_train_batch_size: 128
+- total_eval_batch_size: 64
 - optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
 - lr_scheduler_type: cosine
 - lr_scheduler_warmup_ratio: 0.1
+- num_epochs: 10
 ### Training results
 | Training Loss | Epoch | Step | Validation Loss |
 |:-------------:|:-----:|:----:|:---------------:|
+| 1.0978        | 1.0   | 104  | 2.4831          |
+| 0.9985        | 2.0   | 208  | 2.4666          |
+| 0.9543        | 3.0   | 312  | 2.4561          |
+| 0.92          | 4.0   | 416  | 2.4799          |
+| 0.9016        | 5.0   | 520  | 2.4990          |
+| 0.8871        | 6.0   | 624  | 2.5250          |
+| 0.8635        | 7.0   | 728  | 2.5363          |
+| 0.8535        | 8.0   | 832  | 2.5546          |
+| 0.845         | 9.0   | 936  | 2.5566          |
+| 0.853         | 10.0  | 1040 | 2.5573          |
 ### Framework versions
 - PEFT 0.11.1
+- Transformers 4.41.2
+- Pytorch 2.3.1+cu121
 - Datasets 2.19.2
 - Tokenizers 0.19.1

adapter_model.safetensors CHANGED Viewed

@@ -1,3 +1,3 @@
 version https://git-lfs.github.com/spec/v1
-oid sha256:4814f7f66589aaadfed01dc32a18d8571f439227757d94fe2ae89ec07c1876de
 size 78480320

 version https://git-lfs.github.com/spec/v1
+oid sha256:9a18143e465d21dcbf89d18140313cb39c82bc6b7dabfc909f3bac4791dbce41
 size 78480320

all_results.json CHANGED Viewed

@@ -1,14 +1,9 @@
 {
-    "epoch": 15.0,
-    "eval_loss": 2.511939525604248,
-    "eval_runtime": 0.5077,
-    "eval_samples": 25,
-    "eval_samples_per_second": 19.697,
-    "eval_steps_per_second": 1.97,
-    "total_flos": 2.443473144564941e+18,
-    "train_loss": 0.9840393823690904,
-    "train_runtime": 10705.8296,
     "train_samples": 126706,
-    "train_samples_per_second": 18.636,
-    "train_steps_per_second": 0.291
 }

 {
+    "epoch": 10.0,
+    "total_flos": 1.6530423313701274e+18,
+    "train_loss": 0.977329820394516,
+    "train_runtime": 7126.3412,
     "train_samples": 126706,
+    "train_samples_per_second": 18.665,
+    "train_steps_per_second": 0.146
 }

runs/Jun10_07-38-05_48ddfe8e991f/events.out.tfevents.1718005108.48ddfe8e991f.245392.0 CHANGED Viewed

@@ -1,3 +1,3 @@
 version https://git-lfs.github.com/spec/v1
-oid sha256:45270c1444049457bf3bf1086fc1c2bb641972d25d6781cd50553d6f477cf7a1
-size 50349

 version https://git-lfs.github.com/spec/v1
+oid sha256:1fd5987d98a2ba0a61e0d5e24e4a2f95a1c2c2baefe2e81f297e0f8aa8ab8684
+size 52662

train_results.json CHANGED Viewed

@@ -1,9 +1,9 @@
 {
-    "epoch": 15.0,
-    "total_flos": 2.443473144564941e+18,
-    "train_loss": 0.9840393823690904,
-    "train_runtime": 10705.8296,
     "train_samples": 126706,
-    "train_samples_per_second": 18.636,
-    "train_steps_per_second": 0.291
 }

 {
+    "epoch": 10.0,
+    "total_flos": 1.6530423313701274e+18,
+    "train_loss": 0.977329820394516,
+    "train_runtime": 7126.3412,
     "train_samples": 126706,
+    "train_samples_per_second": 18.665,
+    "train_steps_per_second": 0.146
 }

trainer_state.json CHANGED Viewed

The diff for this file is too large to render. See raw diff