zephyr-7b-dpo-qlora

This model is a fine-tuned version of alignment-handbook/zephyr-7b-sft-full on the HuggingFaceH4/ultrafeedback_binarized dataset. It achieves the following results on the evaluation set:

  • Loss: 0.5299
  • Rewards/chosen: -3.0720
  • Rewards/rejected: -4.6492
  • Rewards/accuracies: 0.7275
  • Rewards/margins: 1.5772
  • Logps/rejected: -728.1719
  • Logps/chosen: -592.3389
  • Logits/rejected: -1.2212
  • Logits/chosen: -1.3455

Model description

More information needed

Intended uses & limitations

More information needed

Training and evaluation data

More information needed

Training procedure

Training hyperparameters

The following hyperparameters were used during training:

  • learning_rate: 5e-06
  • train_batch_size: 1
  • eval_batch_size: 1
  • seed: 42
  • distributed_type: multi-GPU
  • gradient_accumulation_steps: 4
  • total_train_batch_size: 4
  • optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
  • lr_scheduler_type: cosine
  • lr_scheduler_warmup_ratio: 0.1
  • num_epochs: 1

Training results

Training Loss Epoch Step Logits/chosen Logits/rejected Logps/chosen Logps/rejected Validation Loss Rewards/accuracies Rewards/chosen Rewards/margins Rewards/rejected
0.6917 0.0262 400 -3.0250 -2.9734 -284.8504 -263.2442 0.6917 0.6350 0.0029 0.0028 0.0001
0.6787 0.0523 800 0.6800 0.0242 -0.0033 0.6860 0.0276 -263.5826 -282.7138 -2.9532 -3.0046
0.6348 0.0785 1200 0.6376 -0.0096 -0.1486 0.6755 0.1390 -278.1083 -286.0981 -2.8858 -2.9319
0.629 0.1047 1600 0.6087 -0.3993 -0.6875 0.6785 0.2882 -331.9969 -325.0683 -2.7749 -2.8148
0.5602 0.1309 2000 0.5979 -0.5708 -0.9723 0.6855 0.4014 -360.4759 -342.2224 -2.7488 -2.7916
0.5783 0.1570 2400 0.5952 -0.7444 -1.2632 0.6910 0.5188 -389.5722 -359.5799 -2.6852 -2.7273
0.6364 0.1832 2800 0.6014 -2.0557 -2.8123 0.6970 0.7566 -544.4844 -490.7089 -2.0799 -2.1273
0.6807 0.2094 3200 0.5654 -2.1440 -3.0639 0.7030 0.9199 -569.6410 -499.5395 -1.6977 -1.7604
0.6616 0.2355 3600 0.5712 -2.9371 -3.9619 0.7165 1.0247 -659.4373 -578.8513 -1.2775 -1.3472
0.4475 0.2617 4000 0.5522 -2.1606 -3.0883 0.7250 0.9277 -572.0762 -501.1973 -1.6222 -1.6801
0.5934 0.2879 4400 0.5452 -2.0993 -3.0686 0.7150 0.9693 -570.1054 -495.0656 -1.5863 -1.6559
0.5422 0.3141 4800 0.5520 -2.7041 -3.8442 0.7220 1.1401 -647.6720 -555.5510 -1.5167 -1.5930
0.6307 0.3402 5200 0.5378 -2.2755 -3.3838 0.7285 1.1083 -601.6280 -512.6918 -1.6752 -1.7599
0.7039 0.3664 5600 0.5306 -1.7946 -2.8494 0.7250 1.0548 -548.1910 -464.5987 -1.6121 -1.6982
0.6561 0.3926 6000 0.5516 -2.6777 -4.0196 0.7205 1.3418 -665.2089 -552.9131 -1.6257 -1.7129
0.5698 0.4188 6400 0.5181 -2.1847 -3.1985 0.7365 1.0138 -583.0958 -503.6094 -1.6584 -1.7391
0.5919 0.4449 6800 0.5219 -1.9491 -3.1280 0.7195 1.1790 -576.0514 -480.0444 -1.6888 -1.7826
0.6161 0.4711 7200 0.5417 -2.7779 -4.2107 0.7335 1.4328 -684.3200 -562.9326 -1.4277 -1.5325
0.4585 0.4973 7600 0.5326 -2.4424 -3.8173 0.7355 1.3748 -644.9775 -529.3820 -1.5104 -1.6091
0.7168 0.5234 8000 0.5298 -2.7451 -4.1021 0.7390 1.3569 -673.4548 -559.6511 -1.3613 -1.4625
0.7179 0.5496 8400 0.5450 -3.1455 -4.6991 0.7330 1.5536 -733.1592 -599.6882 -1.2796 -1.3950
0.4405 0.5758 8800 0.5088 -1.9634 -3.1323 0.7425 1.1689 -576.4830 -481.4787 -1.5418 -1.6311
0.4464 0.6020 9200 0.5306 -2.5354 -3.9140 0.7325 1.3786 -654.6471 -538.6789 -1.3558 -1.4605
0.43 0.6281 9600 0.5292 -2.7495 -4.1617 0.7335 1.4122 -679.4191 -560.0843 -1.2192 -1.3258
0.48 0.6543 10000 0.5317 -2.5185 -3.9464 0.7245 1.4279 -657.8862 -536.9896 -1.3340 -1.4473
0.7352 0.6805 10400 0.5257 -2.7204 -4.1745 0.7315 1.4541 -680.6992 -557.1738 -1.3220 -1.4356
0.6986 0.7066 10800 0.5242 -2.8515 -4.3094 0.7300 1.4580 -694.1929 -570.2861 -1.2609 -1.3721
0.4944 0.7328 11200 0.5282 -2.8438 -4.3275 0.7320 1.4837 -695.9977 -569.5184 -1.2780 -1.3930
0.3577 0.7590 11600 0.5159 -2.7874 -4.1731 0.7345 1.3857 -680.5639 -563.8783 -1.3489 -1.4592
0.602 0.7852 12000 0.5213 -2.9605 -4.3944 0.7315 1.4339 -702.6897 -581.1863 -1.2926 -1.4077
0.4698 0.8113 12400 0.5320 -3.2528 -4.8286 0.7300 1.5759 -746.1134 -610.4158 -1.1834 -1.3076
0.4796 0.8375 12800 0.5180 -2.7532 -4.1875 0.7325 1.4343 -681.9944 -560.4576 -1.2848 -1.3996
0.4354 0.8637 13200 0.5226 -2.8473 -4.3400 0.7335 1.4927 -697.2530 -569.8687 -1.2477 -1.3671
0.4068 0.8898 13600 0.5262 -3.0065 -4.5462 0.7310 1.5397 -717.8715 -585.7884 -1.2316 -1.3538
0.5134 0.9160 14000 0.5281 -2.9950 -4.5567 0.7300 1.5617 -718.9149 -584.6379 -1.2311 -1.3549
0.7272 0.9422 14400 0.5305 -3.0852 -4.6701 0.7275 1.5849 -730.2634 -593.6614 -1.2166 -1.3417
0.3916 0.9684 14800 0.5299 -3.0770 -4.6548 0.7265 1.5778 -728.7334 -592.8383 -1.2201 -1.3446
0.4814 0.9945 15200 0.5296 -3.0725 -4.6501 0.7280 1.5776 -728.2595 -592.3885 -1.2210 -1.3453

Framework versions

  • PEFT 0.7.1
  • Transformers 4.44.2
  • Pytorch 2.2.2+cu121
  • Datasets 3.2.0
  • Tokenizers 0.19.0
Downloads last month
19
Inference Providers NEW
This model is not currently available via any of the supported Inference Providers.
The model cannot be deployed to the HF Inference API: The model has no pipeline_tag.

Model tree for daijiao/zephyr-7b-dpo-qlora

Adapter
(137)
this model

Dataset used to train daijiao/zephyr-7b-dpo-qlora