zephyr-7b-dpo-qlora
This model is a fine-tuned version of FaeMo/zephyr-7b-sft-qlora on the HuggingFaceH4/ultrafeedback_binarized dataset. It achieves the following results on the evaluation set:
- Loss: 0.4984
- Rewards/chosen: -1.7141
- Rewards/rejected: -2.7356
- Rewards/accuracies: 0.7380
- Rewards/margins: 1.0215
- Logps/rejected: -520.1021
- Logps/chosen: -442.1446
- Logits/rejected: -0.6472
- Logits/chosen: -0.8116
Model description
More information needed
Intended uses & limitations
More information needed
Training and evaluation data
More information needed
Training procedure
Training hyperparameters
The following hyperparameters were used during training:
- learning_rate: 5e-06
- train_batch_size: 4
- eval_batch_size: 8
- seed: 42
- distributed_type: multi-GPU
- num_devices: 2
- gradient_accumulation_steps: 4
- total_train_batch_size: 32
- total_eval_batch_size: 16
- optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
- lr_scheduler_type: cosine
- lr_scheduler_warmup_ratio: 0.1
- num_epochs: 1
Training results
Training Loss | Epoch | Step | Logits/chosen | Logits/rejected | Logps/chosen | Logps/rejected | Validation Loss | Rewards/accuracies | Rewards/chosen | Rewards/margins | Rewards/rejected |
---|---|---|---|---|---|---|---|---|---|---|---|
0.6637 | 0.0523 | 100 | -2.1064 | -2.0116 | -272.9435 | -256.4698 | 0.6629 | 0.6960 | -0.0221 | 0.0772 | -0.0993 |
0.602 | 0.1047 | 200 | -2.0787 | -1.9907 | -315.8761 | -325.2777 | 0.6024 | 0.6980 | -0.4514 | 0.3359 | -0.7874 |
0.6039 | 0.1570 | 300 | -2.1217 | -2.0386 | -397.0779 | -416.8304 | 0.6019 | 0.6940 | -1.2635 | 0.4394 | -1.7029 |
0.5585 | 0.2094 | 400 | -1.7646 | -1.6961 | -400.3457 | -438.6909 | 0.5523 | 0.7350 | -1.2961 | 0.6254 | -1.9215 |
0.5064 | 0.2617 | 500 | -1.5362 | -1.4471 | -482.7069 | -538.2687 | 0.5590 | 0.7180 | -2.1197 | 0.7975 | -2.9173 |
0.5405 | 0.3141 | 600 | -1.4743 | -1.3715 | -384.6164 | -431.6833 | 0.5277 | 0.7460 | -1.1388 | 0.7126 | -1.8514 |
0.5165 | 0.3664 | 700 | -1.2683 | -1.1511 | -390.5944 | -445.0399 | 0.5212 | 0.7440 | -1.1986 | 0.7864 | -1.9850 |
0.545 | 0.4187 | 800 | -1.0790 | -0.9338 | -409.5362 | -465.9337 | 0.5156 | 0.7410 | -1.3880 | 0.8059 | -2.1939 |
0.5079 | 0.4711 | 900 | -1.3680 | -1.2508 | -423.0117 | -489.8168 | 0.5144 | 0.7320 | -1.5228 | 0.9100 | -2.4328 |
0.4872 | 0.5234 | 1000 | -1.1743 | -1.0344 | -429.4106 | -494.0053 | 0.5079 | 0.7330 | -1.5868 | 0.8879 | -2.4746 |
0.4962 | 0.5758 | 1100 | -1.1130 | -0.9681 | -410.1423 | -473.7390 | 0.5052 | 0.7420 | -1.3941 | 0.8779 | -2.2720 |
0.494 | 0.6281 | 1200 | -0.9262 | -0.7778 | -445.8872 | -522.5185 | 0.5027 | 0.7390 | -1.7515 | 1.0082 | -2.7598 |
0.4848 | 0.6805 | 1300 | 0.5030 | -1.4533 | -2.3941 | 0.7420 | 0.9408 | -485.9530 | -416.0602 | -1.0210 | -1.1597 |
0.4792 | 0.7328 | 1400 | 0.5000 | -1.7471 | -2.7718 | 0.7390 | 1.0247 | -523.7210 | -445.4379 | -0.5887 | -0.7571 |
0.4773 | 0.7851 | 1500 | 0.4987 | -1.6362 | -2.6113 | 0.7370 | 0.9751 | -507.6723 | -434.3538 | -0.6593 | -0.8222 |
0.5122 | 0.8375 | 1600 | 0.4988 | -1.5837 | -2.5412 | 0.7420 | 0.9575 | -500.6636 | -429.1013 | -0.7098 | -0.8688 |
0.4726 | 0.8898 | 1700 | 0.4981 | -1.7114 | -2.7207 | 0.7380 | 1.0094 | -518.6156 | -441.8715 | -0.6430 | -0.8071 |
0.4909 | 0.9422 | 1800 | 0.4984 | -1.7246 | -2.7501 | 0.7390 | 1.0254 | -521.5475 | -443.1978 | -0.6501 | -0.8142 |
0.4967 | 0.9945 | 1900 | 0.4984 | -1.7140 | -2.7353 | 0.7390 | 1.0213 | -520.0685 | -442.1331 | -0.6647 | -0.8274 |
Framework versions
- PEFT 0.14.0
- Transformers 4.45.0
- Pytorch 2.6.0+cu124
- Datasets 3.3.2
- Tokenizers 0.20.3
- Downloads last month
- 0
Inference Providers
NEW
This model is not currently available via any of the supported Inference Providers.
The model cannot be deployed to the HF Inference API:
The model has no pipeline_tag.
Model tree for FaeMo/zephyr-7b-dpo-qlora
Base model
mistralai/Mistral-7B-v0.1