--- license: apache-2.0 library_name: peft tags: - alignment-handbook - generated_from_trainer - trl - dpo - generated_from_trainer base_model: mistralai/Mistral-7B-v0.1 datasets: - HuggingFaceH4/ultrafeedback_binarized model-index: - name: zephyr-7b-gpo-v0-i1 results: [] --- # zephyr-7b-gpo-v0-i1 This model is a fine-tuned version of [DUAL-GPO/zephyr-7b-gpo-update3-i0](https://huggingface.co/DUAL-GPO/zephyr-7b-gpo-update3-i0) on the HuggingFaceH4/ultrafeedback_binarized dataset. It achieves the following results on the evaluation set: - Loss: 0.1128 - Rewards/chosen: -0.3200 - Rewards/rejected: -0.3706 - Rewards/accuracies: 0.4955 - Rewards/margins: 0.0506 - Logps/rejected: -621.5818 - Logps/chosen: -585.8446 - Logits/rejected: -1.9142 - Logits/chosen: -2.0965 ## Model description More information needed ## Intended uses & limitations More information needed ## Training and evaluation data More information needed ## Training procedure ### Training hyperparameters The following hyperparameters were used during training: - learning_rate: 5e-06 - train_batch_size: 2 - eval_batch_size: 2 - seed: 42 - distributed_type: multi-GPU - num_devices: 3 - gradient_accumulation_steps: 2 - total_train_batch_size: 12 - total_eval_batch_size: 6 - optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08 - lr_scheduler_type: cosine - lr_scheduler_warmup_ratio: 0.1 - num_epochs: 1 ### Training results | Training Loss | Epoch | Step | Validation Loss | Rewards/chosen | Rewards/rejected | Rewards/accuracies | Rewards/margins | Logps/rejected | Logps/chosen | Logits/rejected | Logits/chosen | |:-------------:|:-----:|:----:|:---------------:|:--------------:|:----------------:|:------------------:|:---------------:|:--------------:|:------------:|:---------------:|:-------------:| | 0.3416 | 0.02 | 100 | 0.0447 | -0.0994 | -0.1161 | 0.5883 | 0.0167 | -367.1221 | -365.3260 | -1.7202 | -1.8827 | | 0.2571 | 0.05 | 200 | 0.0858 | -0.1849 | -0.2159 | 0.4790 | 0.0310 | -466.8627 | -450.7509 | -1.8599 | -2.0364 | | 0.2771 | 0.07 | 300 | 0.0910 | -0.2419 | -0.2769 | 0.4775 | 0.0350 | -527.8735 | -507.7906 | -1.9087 | -2.0909 | | 0.2561 | 0.1 | 400 | 0.1127 | -0.4661 | -0.5086 | 0.4895 | 0.0425 | -759.5652 | -731.9658 | -1.9571 | -2.1511 | | 0.2604 | 0.12 | 500 | 0.0826 | -0.3221 | -0.3613 | 0.4835 | 0.0393 | -612.2919 | -587.9281 | -1.8643 | -2.0449 | | 0.2778 | 0.14 | 600 | 0.1033 | -0.2940 | -0.3303 | 0.4760 | 0.0363 | -581.3212 | -559.9218 | -1.8588 | -2.0387 | | 0.2631 | 0.17 | 700 | 0.1084 | -0.3587 | -0.4024 | 0.4865 | 0.0437 | -653.3798 | -624.5897 | -1.8458 | -2.0252 | | 0.2264 | 0.19 | 800 | 0.1158 | -0.2355 | -0.2734 | 0.4731 | 0.0378 | -524.3303 | -501.3899 | -1.8726 | -2.0501 | | 0.2593 | 0.22 | 900 | 0.1048 | -0.2730 | -0.3214 | 0.4865 | 0.0485 | -572.4186 | -538.8648 | -1.7883 | -1.9593 | | 0.2248 | 0.24 | 1000 | 0.1122 | -0.2753 | -0.3216 | 0.4760 | 0.0463 | -572.5806 | -541.1548 | -1.8308 | -2.0088 | | 0.2345 | 0.26 | 1100 | 0.1249 | -0.2594 | -0.2977 | 0.4581 | 0.0382 | -548.6310 | -525.3046 | -1.8628 | -2.0406 | | 0.2 | 0.29 | 1200 | 0.1212 | -0.3796 | -0.4250 | 0.4925 | 0.0454 | -675.9450 | -645.4562 | -1.8382 | -2.0177 | | 0.2246 | 0.31 | 1300 | 0.1102 | -0.2548 | -0.3030 | 0.4850 | 0.0482 | -553.9783 | -520.6531 | -1.9584 | -2.1449 | | 0.2481 | 0.34 | 1400 | 0.1082 | -0.2988 | -0.3545 | 0.4955 | 0.0557 | -605.4994 | -564.6545 | -1.8877 | -2.0708 | | 0.232 | 0.36 | 1500 | 0.1053 | -0.2421 | -0.2907 | 0.4910 | 0.0486 | -541.7161 | -508.0170 | -1.9404 | -2.1256 | | 0.2351 | 0.38 | 1600 | 0.1098 | -0.3383 | -0.3864 | 0.4775 | 0.0481 | -637.3510 | -604.1564 | -1.8506 | -2.0290 | | 0.2622 | 0.41 | 1700 | 0.1196 | -0.2614 | -0.3121 | 0.4820 | 0.0507 | -563.0452 | -527.2568 | -1.9197 | -2.1016 | | 0.2043 | 0.43 | 1800 | 0.1257 | -0.2798 | -0.3252 | 0.4820 | 0.0454 | -576.1965 | -545.7018 | -1.9177 | -2.0980 | | 0.2205 | 0.46 | 1900 | 0.1154 | -0.4037 | -0.4629 | 0.4850 | 0.0592 | -713.9170 | -669.5957 | -1.8198 | -1.9972 | | 0.2156 | 0.48 | 2000 | 0.1103 | -0.2727 | -0.3161 | 0.4865 | 0.0434 | -567.0794 | -538.5911 | -1.9234 | -2.1044 | | 0.2308 | 0.5 | 2100 | 0.1163 | -0.4322 | -0.4852 | 0.4925 | 0.0531 | -736.1898 | -698.0287 | -1.8013 | -1.9761 | | 0.2204 | 0.53 | 2200 | 0.1083 | -0.3224 | -0.3712 | 0.4940 | 0.0488 | -622.1750 | -588.3229 | -1.8487 | -2.0260 | | 0.2303 | 0.55 | 2300 | 0.1192 | -0.3117 | -0.3667 | 0.4940 | 0.0551 | -617.7075 | -577.5367 | -1.8679 | -2.0473 | | 0.231 | 0.58 | 2400 | 0.1068 | -0.3476 | -0.4008 | 0.5 | 0.0532 | -651.7600 | -613.4935 | -1.8167 | -1.9926 | | 0.2252 | 0.6 | 2500 | 0.1240 | -0.3568 | -0.4154 | 0.4940 | 0.0586 | -666.3873 | -622.7224 | -1.9124 | -2.0972 | | 0.2445 | 0.62 | 2600 | 0.1240 | -0.3426 | -0.4003 | 0.4805 | 0.0576 | -651.2365 | -608.5200 | -1.9230 | -2.1073 | | 0.2212 | 0.65 | 2700 | 0.1103 | -0.2894 | -0.3362 | 0.4925 | 0.0468 | -587.1506 | -555.2968 | -1.9049 | -2.0860 | | 0.2301 | 0.67 | 2800 | 0.1073 | -0.2754 | -0.3278 | 0.5105 | 0.0524 | -578.7745 | -541.2313 | -1.9024 | -2.0838 | | 0.2099 | 0.7 | 2900 | 0.1191 | -0.3108 | -0.3657 | 0.5015 | 0.0549 | -616.7156 | -576.6858 | -1.9182 | -2.1014 | | 0.2072 | 0.72 | 3000 | 0.1120 | -0.3062 | -0.3563 | 0.4910 | 0.0500 | -607.2319 | -572.1099 | -1.9258 | -2.1090 | | 0.2186 | 0.74 | 3100 | 0.1155 | -0.2960 | -0.3474 | 0.4985 | 0.0514 | -598.4005 | -561.9234 | -1.9031 | -2.0849 | | 0.2743 | 0.77 | 3200 | 0.1121 | -0.2815 | -0.3314 | 0.4955 | 0.0499 | -582.3980 | -547.4086 | -1.9332 | -2.1170 | | 0.1989 | 0.79 | 3300 | 0.1116 | -0.3235 | -0.3744 | 0.4850 | 0.0509 | -625.3889 | -589.4213 | -1.8977 | -2.0789 | | 0.2258 | 0.82 | 3400 | 0.1093 | -0.3091 | -0.3603 | 0.4970 | 0.0512 | -611.2418 | -574.9766 | -1.9164 | -2.0989 | | 0.2524 | 0.84 | 3500 | 0.1142 | -0.3383 | -0.3897 | 0.4910 | 0.0514 | -640.6893 | -604.2028 | -1.9130 | -2.0956 | | 0.2202 | 0.86 | 3600 | 0.1173 | -0.3412 | -0.3925 | 0.4835 | 0.0513 | -643.4937 | -607.1244 | -1.9146 | -2.0973 | | 0.2365 | 0.89 | 3700 | 0.1178 | -0.3273 | -0.3787 | 0.4850 | 0.0514 | -629.6786 | -593.2114 | -1.9279 | -2.1117 | | 0.1894 | 0.91 | 3800 | 0.1152 | -0.3184 | -0.3694 | 0.4925 | 0.0509 | -620.3304 | -584.3237 | -1.9252 | -2.1088 | | 0.2372 | 0.94 | 3900 | 0.1130 | -0.3155 | -0.3658 | 0.4940 | 0.0503 | -616.7926 | -581.3542 | -1.9194 | -2.1021 | | 0.2029 | 0.96 | 4000 | 0.1133 | -0.3208 | -0.3715 | 0.4925 | 0.0507 | -622.4911 | -586.6887 | -1.9141 | -2.0964 | | 0.2438 | 0.98 | 4100 | 0.1129 | -0.3199 | -0.3707 | 0.4940 | 0.0508 | -621.6636 | -585.7551 | -1.9140 | -2.0965 | ### Framework versions - PEFT 0.7.1 - Transformers 4.36.2 - Pytorch 2.1.2+cu121 - Datasets 2.14.6 - Tokenizers 0.15.2