Model save

Files changed (10) hide show

README.md CHANGED Viewed

@@ -4,6 +4,7 @@ library_name: peft
 tags:
 - trl
 - sft
 - generated_from_trainer
 base_model: mistralai/Mistral-7B-v0.1
 datasets:
@@ -19,8 +20,6 @@ should probably proofread and complete it, then remove this comment. -->
 # zephyr-7b-sft-qlora
 This model is a fine-tuned version of [mistralai/Mistral-7B-v0.1](https://huggingface.co/mistralai/Mistral-7B-v0.1) on the generator dataset.
-It achieves the following results on the evaluation set:
-- Loss: 0.7922
 ## Model description
@@ -52,19 +51,15 @@ The following hyperparameters were used during training:
 - lr_scheduler_type: cosine
 - lr_scheduler_warmup_ratio: 0.1
 - num_epochs: 1
-- mixed_precision_training: Native AMP
 ### Training results
-| Training Loss | Epoch  | Step | Validation Loss |
-|:-------------:|:------:|:----:|:---------------:|
-| 0.7616        | 0.9999 | 9785 | 0.7922          |
 ### Framework versions
-- PEFT 0.7.1
-- Transformers 4.40.1
 - Pytorch 2.1.2
-- Datasets 2.19.0
-- Tokenizers 0.19.1

 tags:
 - trl
 - sft
+- alignment-handbook
 - generated_from_trainer
 base_model: mistralai/Mistral-7B-v0.1
 datasets:
 # zephyr-7b-sft-qlora
 This model is a fine-tuned version of [mistralai/Mistral-7B-v0.1](https://huggingface.co/mistralai/Mistral-7B-v0.1) on the generator dataset.
 ## Model description
 - lr_scheduler_type: cosine
 - lr_scheduler_warmup_ratio: 0.1
 - num_epochs: 1
 ### Training results
 ### Framework versions
+- PEFT 0.9.0
+- Transformers 4.39.3
 - Pytorch 2.1.2
+- Datasets 2.18.0
+- Tokenizers 0.15.2

adapter_config.json CHANGED Viewed

@@ -20,12 +20,14 @@
   "revision": null,
   "target_modules": [
     "up_proj",
     "k_proj",
     "gate_proj",
     "v_proj",
-    "o_proj",
-    "down_proj",
-    "q_proj"
   ],
-  "task_type": "CAUSAL_LM"
 }

   "revision": null,
   "target_modules": [
     "up_proj",
+    "q_proj",
     "k_proj",
+    "o_proj",
     "gate_proj",
     "v_proj",
+    "down_proj"
   ],
+  "task_type": "CAUSAL_LM",
+  "use_dora": false,
+  "use_rslora": false
 }

adapter_model.safetensors CHANGED Viewed

@@ -1,3 +1,3 @@
 version https://git-lfs.github.com/spec/v1
-oid sha256:309f0cd223f5f25bbff9bd251fcd3f8f6d44f27ab421f1bac96fc8ac9a99b762
-size 167832240

 version https://git-lfs.github.com/spec/v1
+oid sha256:527d98212d9cdb166d3f3175fd164a01ccdfc72041f5c46b14e7a06f06267de5
+size 83946192

all_results.json CHANGED Viewed

@@ -1,9 +1,14 @@
 {
-    "epoch": 0.9999489039905983,
-    "total_flos": 2.752048962850731e+19,
-    "train_loss": 0.9653273651900801,
-    "train_runtime": 48538.6259,
-    "train_samples": 1299087,
-    "train_samples_per_second": 6.451,
-    "train_steps_per_second": 0.202
 }

 {
+    "epoch": 0.99,
+    "eval_loss": 0.9485585689544678,
+    "eval_runtime": 565.6562,
+    "eval_samples": 23109,
+    "eval_samples_per_second": 27.28,
+    "eval_steps_per_second": 0.854,
+    "total_flos": 1.2254141370096157e+19,
+    "train_loss": 0.0,
+    "train_runtime": 0.0433,
+    "train_samples": 2078,
+    "train_samples_per_second": 31910.696,
+    "train_steps_per_second": 993.599
 }

eval_results.json CHANGED Viewed

@@ -1,8 +1,8 @@
 {
-    "epoch": 2.0,
-    "eval_loss": 0.7585077881813049,
-    "eval_runtime": 7252.7987,
-    "eval_samples": 821583,
-    "eval_samples_per_second": 27.291,
-    "eval_steps_per_second": 0.853
 }

 {
+    "epoch": 0.9998852553069421,
+    "eval_loss": 0.9485585689544678,
+    "eval_runtime": 565.6562,
+    "eval_samples": 23109,
+    "eval_samples_per_second": 27.28,
+    "eval_steps_per_second": 0.854
 }

runs/May19_07-08-46_training-queue-st-p4d-24xlarge-1/events.out.tfevents.1716102745.training-queue-st-p4d-24xlarge-1.3797.0 ADDED Viewed

+version https://git-lfs.github.com/spec/v1
+oid sha256:326af8353171ecf25d6d218ccff10fb8bce5d9afca88ecacbd8108c2ffe2d43d
+size 5558

tokenizer.json CHANGED Viewed

@@ -134,7 +134,6 @@
     "end_of_word_suffix": null,
     "fuse_unk": true,
     "byte_fallback": true,
-    "ignore_merges": false,
     "vocab": {
       "<unk>": 0,
       "<s>": 1,

     "end_of_word_suffix": null,
     "fuse_unk": true,
     "byte_fallback": true,
     "vocab": {
       "<unk>": 0,
       "<s>": 1,

train_results.json CHANGED Viewed

@@ -1,9 +1,8 @@
 {
-    "epoch": 0.9999489039905983,
-    "total_flos": 2.752048962850731e+19,
-    "train_loss": 0.9653273651900801,
-    "train_runtime": 48538.6259,
-    "train_samples": 1299087,
-    "train_samples_per_second": 6.451,
-    "train_steps_per_second": 0.202
 }

 {
+    "epoch": 0.99,
+    "train_loss": 0.0,
+    "train_runtime": 0.0433,
+    "train_samples": 2078,
+    "train_samples_per_second": 31910.696,
+    "train_steps_per_second": 993.599
 }

trainer_state.json CHANGED Viewed

The diff for this file is too large to render. See raw diff

training_args.bin CHANGED Viewed

@@ -1,3 +1,3 @@
 version https://git-lfs.github.com/spec/v1
-oid sha256:eaf129790d02402ef80ffe8262e569c9327ca62ae50ab7f0e7b51b9e9c4f0a2b
-size 5112

 version https://git-lfs.github.com/spec/v1
+oid sha256:2da86413667afdb3ef9626533f4a1f9a0770272a3f50fe1d8c7f322f97e85c2a
+size 5048