Model save

Browse files

Files changed (7) hide show

README.md +21 -26
all_results.json +4 -4
config.json +1 -1
runs/Aug25_00-33-37_ip-10-0-9-154.ec2.internal/events.out.tfevents.1724547401.ip-10-0-9-154.ec2.internal.32573.0 +3 -0
train_results.json +4 -4
trainer_state.json +4 -4
training_args.bin +1 -1

README.md CHANGED Viewed

@@ -3,15 +3,10 @@ library_name: transformers
 license: apache-2.0
 base_model: alignment-handbook/zephyr-7b-sft-full
 tags:
-- alignment-handbook
-- trl
-- dpo
-- generated_from_trainer
 - trl
 - dpo
 - generated_from_trainer
-datasets:
-- HuggingFaceH4/ultrafeedback_binarized
 model-index:
 - name: zephyr-7b-dpo-full
   results: []
@@ -22,17 +17,17 @@ should probably proofread and complete it, then remove this comment. -->
 # zephyr-7b-dpo-full
-This model is a fine-tuned version of [alignment-handbook/zephyr-7b-sft-full](https://huggingface.co/alignment-handbook/zephyr-7b-sft-full) on the HuggingFaceH4/ultrafeedback_binarized dataset.
 It achieves the following results on the evaluation set:
-- Loss: 0.3148
-- Rewards/chosen: -4.6400
-- Rewards/rejected: -8.9971
-- Rewards/accuracies: 0.8178
-- Rewards/margins: 4.3571
-- Logps/rejected: -1189.8031
-- Logps/chosen: -754.9667
-- Logits/rejected: 0.6135
-- Logits/chosen: -0.2969
 ## Model description
@@ -67,16 +62,16 @@ The following hyperparameters were used during training:
 ### Training results
-| Training Loss | Epoch  | Step | Validation Loss | Rewards/chosen | Rewards/rejected | Rewards/accuracies | Rewards/margins | Logps/rejected | Logps/chosen | Logits/rejected | Logits/chosen |
-|:-------------:|:------:|:----:|:---------------:|:--------------:|:----------------:|:------------------:|:---------------:|:--------------:|:------------:|:---------------:|:-------------:|
-| 0.5385        | 0.1152 | 100  | 0.4593          | -1.4246        | -2.3741          | 0.7539             | 0.9495          | -527.4997      | -433.4271    | -2.8749         | -2.9012       |
-| 0.4369        | 0.2303 | 200  | 0.3590          | -3.0332        | -5.3301          | 0.7915             | 2.2969          | -823.1062      | -594.2914    | -1.1798         | -1.5078       |
-| 0.4119        | 0.3455 | 300  | 0.3369          | -3.8623        | -7.0684          | 0.8156             | 3.2061          | -996.9340      | -677.2002    | -0.1140         | -0.6166       |
-| 0.3964        | 0.4607 | 400  | 0.3311          | -4.6245        | -8.3800          | 0.8178             | 3.7555          | -1128.0946     | -753.4187    | 0.2313          | -0.6209       |
-| 0.3858        | 0.5759 | 500  | 0.3247          | -4.0345        | -7.5975          | 0.8167             | 3.5630          | -1049.8429     | -694.4181    | 0.1893          | -0.7776       |
-| 0.4031        | 0.6910 | 600  | 0.3191          | -4.5734        | -8.5306          | 0.8201             | 3.9572          | -1143.1573     | -748.3096    | 0.6163          | -0.2605       |
-| 0.4007        | 0.8062 | 700  | 0.3171          | -4.6204        | -8.9933          | 0.8178             | 4.3729          | -1189.4250     | -753.0112    | 0.4411          | -0.4982       |
-| 0.3644        | 0.9214 | 800  | 0.3152          | -4.6496        | -9.0247          | 0.8184             | 4.3751          | -1192.5621     | -755.9323    | 0.6049          | -0.3096       |
 ### Framework versions

 license: apache-2.0
 base_model: alignment-handbook/zephyr-7b-sft-full
 tags:
 - trl
 - dpo
+- alignment-handbook
 - generated_from_trainer
 model-index:
 - name: zephyr-7b-dpo-full
   results: []
 # zephyr-7b-dpo-full
+This model is a fine-tuned version of [alignment-handbook/zephyr-7b-sft-full](https://huggingface.co/alignment-handbook/zephyr-7b-sft-full) on the None dataset.
 It achieves the following results on the evaluation set:
+- Logits/chosen: -0.3096
+- Logits/rejected: 0.6049
+- Logps/chosen: -755.9323
+- Logps/rejected: -1192.5621
+- Loss: 0.3152
+- Rewards/accuracies: 0.8184
+- Rewards/chosen: -4.6496
+- Rewards/margins: 4.3751
+- Rewards/rejected: -9.0247
 ## Model description
 ### Training results
+| Training Loss | Epoch  | Step | Logits/chosen | Logits/rejected | Logps/chosen | Logps/rejected | Validation Loss | Rewards/accuracies | Rewards/chosen | Rewards/margins | Rewards/rejected |
+|:-------------:|:------:|:----:|:-------------:|:---------------:|:------------:|:--------------:|:---------------:|:------------------:|:--------------:|:---------------:|:----------------:|
+| 0.5385        | 0.1152 | 100  | -2.9012       | -2.8749         | -433.4271    | -527.4997      | 0.4593          | 0.7539             | -1.4246        | 0.9495          | -2.3741          |
+| 0.4369        | 0.2303 | 200  | -1.5078       | -1.1798         | -594.2914    | -823.1062      | 0.3590          | 0.7915             | -3.0332        | 2.2969          | -5.3301          |
+| 0.4119        | 0.3455 | 300  | -0.6166       | -0.1140         | -677.2002    | -996.9340      | 0.3369          | 0.8156             | -3.8623        | 3.2061          | -7.0684          |
+| 0.3964        | 0.4607 | 400  | -0.6209       | 0.2313          | -753.4187    | -1128.0946     | 0.3311          | 0.8178             | -4.6245        | 3.7555          | -8.3800          |
+| 0.3858        | 0.5759 | 500  | -0.7776       | 0.1893          | -694.4181    | -1049.8429     | 0.3247          | 0.8167             | -4.0345        | 3.5630          | -7.5975          |
+| 0.4031        | 0.6910 | 600  | -0.2605       | 0.6163          | -748.3096    | -1143.1573     | 0.3191          | 0.8201             | -4.5734        | 3.9572          | -8.5306          |
+| 0.4007        | 0.8062 | 700  | -0.4982       | 0.4411          | -753.0112    | -1189.4250     | 0.3171          | 0.8178             | -4.6204        | 4.3729          | -8.9933          |
+| 0.3644        | 0.9214 | 800  | -0.3096       | 0.6049          | -755.9323    | -1192.5621     | 0.3152          | 0.8184             | -4.6496        | 4.3751          | -9.0247          |
 ### Framework versions

all_results.json CHANGED Viewed

@@ -14,9 +14,9 @@
     "eval_samples_per_second": 10.983,
     "eval_steps_per_second": 0.344,
     "total_flos": 0.0,
-    "train_loss": 0.42798474719447477,
-    "train_runtime": 41692.5547,
     "train_samples": 111134,
-    "train_samples_per_second": 2.666,
-    "train_steps_per_second": 0.021
 }

     "eval_samples_per_second": 10.983,
     "eval_steps_per_second": 0.344,
     "total_flos": 0.0,
+    "train_loss": 0.0,
+    "train_runtime": 0.0165,
     "train_samples": 111134,
+    "train_samples_per_second": 6754818.797,
+    "train_steps_per_second": 52757.776
 }

config.json CHANGED Viewed

@@ -22,6 +22,6 @@
   "tie_word_embeddings": false,
   "torch_dtype": "bfloat16",
   "transformers_version": "4.44.1",
-  "use_cache": true,
   "vocab_size": 32000
 }

   "tie_word_embeddings": false,
   "torch_dtype": "bfloat16",
   "transformers_version": "4.44.1",
+  "use_cache": false,
   "vocab_size": 32000
 }

runs/Aug25_00-33-37_ip-10-0-9-154.ec2.internal/events.out.tfevents.1724547401.ip-10-0-9-154.ec2.internal.32573.0 ADDED Viewed

	@@ -0,0 +1,3 @@

+version https://git-lfs.github.com/spec/v1
+oid sha256:728af975b19adca6451e531d165002e5fd21f6dde541a5c6ef0af6f839f98771
+size 6511

train_results.json CHANGED Viewed

@@ -1,9 +1,9 @@
 {
     "epoch": 0.9997120644975526,
     "total_flos": 0.0,
-    "train_loss": 0.42798474719447477,
-    "train_runtime": 41692.5547,
     "train_samples": 111134,
-    "train_samples_per_second": 2.666,
-    "train_steps_per_second": 0.021
 }

 {
     "epoch": 0.9997120644975526,
     "total_flos": 0.0,
+    "train_loss": 0.0,
+    "train_runtime": 0.0165,
     "train_samples": 111134,
+    "train_samples_per_second": 6754818.797,
+    "train_steps_per_second": 52757.776
 }

trainer_state.json CHANGED Viewed

@@ -1445,10 +1445,10 @@
       "epoch": 0.9997120644975526,
       "step": 868,
       "total_flos": 0.0,
-      "train_loss": 0.42798474719447477,
-      "train_runtime": 41692.5547,
-      "train_samples_per_second": 2.666,
-      "train_steps_per_second": 0.021
     }
   ],
   "logging_steps": 10,

       "epoch": 0.9997120644975526,
       "step": 868,
       "total_flos": 0.0,
+      "train_loss": 0.0,
+      "train_runtime": 0.0165,
+      "train_samples_per_second": 6754818.797,
+      "train_steps_per_second": 52757.776
     }
   ],
   "logging_steps": 10,

training_args.bin CHANGED Viewed

@@ -1,3 +1,3 @@
 version https://git-lfs.github.com/spec/v1
-oid sha256:9e7ab4a3dc125627368f9b36260b6b41eff1cfcfcd6f57333c4968cea354b935
 size 7480

 version https://git-lfs.github.com/spec/v1
+oid sha256:cd37cb1dc1df49b1c196baae69716974588d4499feec8711bb52413e033a8a20
 size 7480