Model save

Files changed (6) hide show

README.md CHANGED Viewed

@@ -2,14 +2,10 @@
 license: apache-2.0
 library_name: peft
 tags:
-- alignment-handbook
-- generated_from_trainer
 - trl
 - sft
 - generated_from_trainer
 base_model: mistralai/Mistral-7B-v0.1
-datasets:
-- HuggingFaceH4/ultrachat_200k
 model-index:
 - name: zephyr-7b-sft-qlora
   results: []
@@ -20,9 +16,9 @@ should probably proofread and complete it, then remove this comment. -->
 # zephyr-7b-sft-qlora
-This model is a fine-tuned version of [mistralai/Mistral-7B-v0.1](https://huggingface.co/mistralai/Mistral-7B-v0.1) on the HuggingFaceH4/ultrachat_200k dataset.
 It achieves the following results on the evaluation set:
-- Loss: 4.8866
 ## Model description
@@ -42,14 +38,14 @@ More information needed
 The following hyperparameters were used during training:
 - learning_rate: 0.0002
-- train_batch_size: 2
-- eval_batch_size: 2
 - seed: 42
 - distributed_type: multi-GPU
 - num_devices: 4
 - gradient_accumulation_steps: 2
-- total_train_batch_size: 16
-- total_eval_batch_size: 8
 - optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
 - lr_scheduler_type: cosine
 - lr_scheduler_warmup_ratio: 0.1
@@ -57,15 +53,15 @@ The following hyperparameters were used during training:
 ### Training results
-| Training Loss | Epoch | Step  | Validation Loss |
-|:-------------:|:-----:|:-----:|:---------------:|
-| 4.8496        | 1.0   | 12498 | 4.8866          |
 ### Framework versions
 - PEFT 0.7.1
-- Transformers 4.36.2
-- Pytorch 2.2.1+cu121
-- Datasets 2.14.6
 - Tokenizers 0.15.2

 license: apache-2.0
 library_name: peft
 tags:
 - trl
 - sft
 - generated_from_trainer
 base_model: mistralai/Mistral-7B-v0.1
 model-index:
 - name: zephyr-7b-sft-qlora
   results: []
 # zephyr-7b-sft-qlora
+This model is a fine-tuned version of [mistralai/Mistral-7B-v0.1](https://huggingface.co/mistralai/Mistral-7B-v0.1) on the None dataset.
 It achieves the following results on the evaluation set:
+- Loss: 0.7776
 ## Model description
 The following hyperparameters were used during training:
 - learning_rate: 0.0002
+- train_batch_size: 8
+- eval_batch_size: 8
 - seed: 42
 - distributed_type: multi-GPU
 - num_devices: 4
 - gradient_accumulation_steps: 2
+- total_train_batch_size: 64
+- total_eval_batch_size: 32
 - optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
 - lr_scheduler_type: cosine
 - lr_scheduler_warmup_ratio: 0.1
 ### Training results
+| Training Loss | Epoch | Step | Validation Loss |
+|:-------------:|:-----:|:----:|:---------------:|
+| 0.749         | 1.0   | 325  | 0.7776          |
 ### Framework versions
 - PEFT 0.7.1
+- Transformers 4.39.0.dev0
+- Pytorch 2.2.2+cu121
+- Datasets 2.18.0
 - Tokenizers 0.15.2

adapter_model.safetensors CHANGED Viewed

@@ -1,3 +1,3 @@
 version https://git-lfs.github.com/spec/v1
-oid sha256:9056dcc79a5be5d1b4f6dbe555116948041ef25d5964be6be4bdb37ca0c5f2e7
 size 83946192

 version https://git-lfs.github.com/spec/v1
+oid sha256:b7db789cc50c06995a37c003083c8410dd62bf3d2f9e61b0d057fddd2e5238aa
 size 83946192

all_results.json CHANGED Viewed

@@ -1,8 +1,8 @@
 {
     "epoch": 1.0,
-    "train_loss": 4.727149732982012,
-    "train_runtime": 175008.638,
-    "train_samples": 228651,
-    "train_samples_per_second": 1.143,
-    "train_steps_per_second": 0.071
 }

 {
     "epoch": 1.0,
+    "train_loss": 0.7600007471671472,
+    "train_runtime": 11976.0052,
+    "train_samples": 20787,
+    "train_samples_per_second": 1.736,
+    "train_steps_per_second": 0.027
 }

runs/Apr06_23-53-51_ip-172-31-69-60.ec2.internal/events.out.tfevents.1712447651.ip-172-31-69-60.ec2.internal.1668.0 CHANGED Viewed

@@ -1,3 +1,3 @@
 version https://git-lfs.github.com/spec/v1
-oid sha256:3eee1004ec4e24c2207fa44e91ce762b80d968af0df0102657a83deb243de153
-size 13237

 version https://git-lfs.github.com/spec/v1
+oid sha256:30389fd342652db733fa8356d7fa18104acdb8ad4385a698fa92fd3104c6b021
+size 19137

train_results.json CHANGED Viewed

@@ -1,8 +1,8 @@
 {
     "epoch": 1.0,
-    "train_loss": 4.727149732982012,
-    "train_runtime": 175008.638,
-    "train_samples": 228651,
-    "train_samples_per_second": 1.143,
-    "train_steps_per_second": 0.071
 }

 {
     "epoch": 1.0,
+    "train_loss": 0.7600007471671472,
+    "train_runtime": 11976.0052,
+    "train_samples": 20787,
+    "train_samples_per_second": 1.736,
+    "train_steps_per_second": 0.027
 }

trainer_state.json CHANGED Viewed

The diff for this file is too large to render. See raw diff