zephyr-7b-dpo-qlora
This model is a fine-tuned version of alignment-handbook/zephyr-7b-sft-full on the HuggingFaceH4/ultrafeedback_binarized dataset. It achieves the following results on the evaluation set:
- Loss: 0.5299
- Rewards/chosen: -3.0720
- Rewards/rejected: -4.6492
- Rewards/accuracies: 0.7275
- Rewards/margins: 1.5772
- Logps/rejected: -728.1719
- Logps/chosen: -592.3389
- Logits/rejected: -1.2212
- Logits/chosen: -1.3455
Model description
More information needed
Intended uses & limitations
More information needed
Training and evaluation data
More information needed
Training procedure
Training hyperparameters
The following hyperparameters were used during training:
- learning_rate: 5e-06
- train_batch_size: 1
- eval_batch_size: 1
- seed: 42
- distributed_type: multi-GPU
- gradient_accumulation_steps: 4
- total_train_batch_size: 4
- optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
- lr_scheduler_type: cosine
- lr_scheduler_warmup_ratio: 0.1
- num_epochs: 1
Training results
Training Loss | Epoch | Step | Logits/chosen | Logits/rejected | Logps/chosen | Logps/rejected | Validation Loss | Rewards/accuracies | Rewards/chosen | Rewards/margins | Rewards/rejected |
---|---|---|---|---|---|---|---|---|---|---|---|
0.6917 | 0.0262 | 400 | -3.0250 | -2.9734 | -284.8504 | -263.2442 | 0.6917 | 0.6350 | 0.0029 | 0.0028 | 0.0001 |
0.6787 | 0.0523 | 800 | 0.6800 | 0.0242 | -0.0033 | 0.6860 | 0.0276 | -263.5826 | -282.7138 | -2.9532 | -3.0046 |
0.6348 | 0.0785 | 1200 | 0.6376 | -0.0096 | -0.1486 | 0.6755 | 0.1390 | -278.1083 | -286.0981 | -2.8858 | -2.9319 |
0.629 | 0.1047 | 1600 | 0.6087 | -0.3993 | -0.6875 | 0.6785 | 0.2882 | -331.9969 | -325.0683 | -2.7749 | -2.8148 |
0.5602 | 0.1309 | 2000 | 0.5979 | -0.5708 | -0.9723 | 0.6855 | 0.4014 | -360.4759 | -342.2224 | -2.7488 | -2.7916 |
0.5783 | 0.1570 | 2400 | 0.5952 | -0.7444 | -1.2632 | 0.6910 | 0.5188 | -389.5722 | -359.5799 | -2.6852 | -2.7273 |
0.6364 | 0.1832 | 2800 | 0.6014 | -2.0557 | -2.8123 | 0.6970 | 0.7566 | -544.4844 | -490.7089 | -2.0799 | -2.1273 |
0.6807 | 0.2094 | 3200 | 0.5654 | -2.1440 | -3.0639 | 0.7030 | 0.9199 | -569.6410 | -499.5395 | -1.6977 | -1.7604 |
0.6616 | 0.2355 | 3600 | 0.5712 | -2.9371 | -3.9619 | 0.7165 | 1.0247 | -659.4373 | -578.8513 | -1.2775 | -1.3472 |
0.4475 | 0.2617 | 4000 | 0.5522 | -2.1606 | -3.0883 | 0.7250 | 0.9277 | -572.0762 | -501.1973 | -1.6222 | -1.6801 |
0.5934 | 0.2879 | 4400 | 0.5452 | -2.0993 | -3.0686 | 0.7150 | 0.9693 | -570.1054 | -495.0656 | -1.5863 | -1.6559 |
0.5422 | 0.3141 | 4800 | 0.5520 | -2.7041 | -3.8442 | 0.7220 | 1.1401 | -647.6720 | -555.5510 | -1.5167 | -1.5930 |
0.6307 | 0.3402 | 5200 | 0.5378 | -2.2755 | -3.3838 | 0.7285 | 1.1083 | -601.6280 | -512.6918 | -1.6752 | -1.7599 |
0.7039 | 0.3664 | 5600 | 0.5306 | -1.7946 | -2.8494 | 0.7250 | 1.0548 | -548.1910 | -464.5987 | -1.6121 | -1.6982 |
0.6561 | 0.3926 | 6000 | 0.5516 | -2.6777 | -4.0196 | 0.7205 | 1.3418 | -665.2089 | -552.9131 | -1.6257 | -1.7129 |
0.5698 | 0.4188 | 6400 | 0.5181 | -2.1847 | -3.1985 | 0.7365 | 1.0138 | -583.0958 | -503.6094 | -1.6584 | -1.7391 |
0.5919 | 0.4449 | 6800 | 0.5219 | -1.9491 | -3.1280 | 0.7195 | 1.1790 | -576.0514 | -480.0444 | -1.6888 | -1.7826 |
0.6161 | 0.4711 | 7200 | 0.5417 | -2.7779 | -4.2107 | 0.7335 | 1.4328 | -684.3200 | -562.9326 | -1.4277 | -1.5325 |
0.4585 | 0.4973 | 7600 | 0.5326 | -2.4424 | -3.8173 | 0.7355 | 1.3748 | -644.9775 | -529.3820 | -1.5104 | -1.6091 |
0.7168 | 0.5234 | 8000 | 0.5298 | -2.7451 | -4.1021 | 0.7390 | 1.3569 | -673.4548 | -559.6511 | -1.3613 | -1.4625 |
0.7179 | 0.5496 | 8400 | 0.5450 | -3.1455 | -4.6991 | 0.7330 | 1.5536 | -733.1592 | -599.6882 | -1.2796 | -1.3950 |
0.4405 | 0.5758 | 8800 | 0.5088 | -1.9634 | -3.1323 | 0.7425 | 1.1689 | -576.4830 | -481.4787 | -1.5418 | -1.6311 |
0.4464 | 0.6020 | 9200 | 0.5306 | -2.5354 | -3.9140 | 0.7325 | 1.3786 | -654.6471 | -538.6789 | -1.3558 | -1.4605 |
0.43 | 0.6281 | 9600 | 0.5292 | -2.7495 | -4.1617 | 0.7335 | 1.4122 | -679.4191 | -560.0843 | -1.2192 | -1.3258 |
0.48 | 0.6543 | 10000 | 0.5317 | -2.5185 | -3.9464 | 0.7245 | 1.4279 | -657.8862 | -536.9896 | -1.3340 | -1.4473 |
0.7352 | 0.6805 | 10400 | 0.5257 | -2.7204 | -4.1745 | 0.7315 | 1.4541 | -680.6992 | -557.1738 | -1.3220 | -1.4356 |
0.6986 | 0.7066 | 10800 | 0.5242 | -2.8515 | -4.3094 | 0.7300 | 1.4580 | -694.1929 | -570.2861 | -1.2609 | -1.3721 |
0.4944 | 0.7328 | 11200 | 0.5282 | -2.8438 | -4.3275 | 0.7320 | 1.4837 | -695.9977 | -569.5184 | -1.2780 | -1.3930 |
0.3577 | 0.7590 | 11600 | 0.5159 | -2.7874 | -4.1731 | 0.7345 | 1.3857 | -680.5639 | -563.8783 | -1.3489 | -1.4592 |
0.602 | 0.7852 | 12000 | 0.5213 | -2.9605 | -4.3944 | 0.7315 | 1.4339 | -702.6897 | -581.1863 | -1.2926 | -1.4077 |
0.4698 | 0.8113 | 12400 | 0.5320 | -3.2528 | -4.8286 | 0.7300 | 1.5759 | -746.1134 | -610.4158 | -1.1834 | -1.3076 |
0.4796 | 0.8375 | 12800 | 0.5180 | -2.7532 | -4.1875 | 0.7325 | 1.4343 | -681.9944 | -560.4576 | -1.2848 | -1.3996 |
0.4354 | 0.8637 | 13200 | 0.5226 | -2.8473 | -4.3400 | 0.7335 | 1.4927 | -697.2530 | -569.8687 | -1.2477 | -1.3671 |
0.4068 | 0.8898 | 13600 | 0.5262 | -3.0065 | -4.5462 | 0.7310 | 1.5397 | -717.8715 | -585.7884 | -1.2316 | -1.3538 |
0.5134 | 0.9160 | 14000 | 0.5281 | -2.9950 | -4.5567 | 0.7300 | 1.5617 | -718.9149 | -584.6379 | -1.2311 | -1.3549 |
0.7272 | 0.9422 | 14400 | 0.5305 | -3.0852 | -4.6701 | 0.7275 | 1.5849 | -730.2634 | -593.6614 | -1.2166 | -1.3417 |
0.3916 | 0.9684 | 14800 | 0.5299 | -3.0770 | -4.6548 | 0.7265 | 1.5778 | -728.7334 | -592.8383 | -1.2201 | -1.3446 |
0.4814 | 0.9945 | 15200 | 0.5296 | -3.0725 | -4.6501 | 0.7280 | 1.5776 | -728.2595 | -592.3885 | -1.2210 | -1.3453 |
Framework versions
- PEFT 0.7.1
- Transformers 4.44.2
- Pytorch 2.2.2+cu121
- Datasets 3.2.0
- Tokenizers 0.19.0
- Downloads last month
- 19
Inference Providers
NEW
This model is not currently available via any of the supported Inference Providers.
The model cannot be deployed to the HF Inference API:
The model has no pipeline_tag.
Model tree for daijiao/zephyr-7b-dpo-qlora
Base model
mistralai/Mistral-7B-v0.1
Finetuned
alignment-handbook/zephyr-7b-sft-full