--- license: other library_name: peft tags: - trl - dpo - generated_from_trainer base_model: Qwen/Qwen1.5-7B-Chat model-index: - name: Qwen1.5-7B-Dutch-Chat-Dpo results: [] --- # Qwen1.5-7B-Dutch-Chat-Dpo This model is a fine-tuned version of [Qwen/Qwen1.5-7B-Chat](https://huggingface.co/Qwen/Qwen1.5-7B-Chat) on an unknown dataset. It achieves the following results on the evaluation set: - Loss: 0.2610 - Rewards/chosen: -0.7248 - Rewards/rejected: -2.6224 - Rewards/accuracies: 0.9170 - Rewards/margins: 1.8976 - Logps/rejected: -877.8102 - Logps/chosen: -783.4282 - Logits/rejected: -0.8110 - Logits/chosen: -0.7528 ## Model description More information needed ## Intended uses & limitations More information needed ## Training and evaluation data More information needed ## Training procedure ### Training hyperparameters The following hyperparameters were used during training: - learning_rate: 1e-05 - train_batch_size: 1 - eval_batch_size: 2 - seed: 42 - gradient_accumulation_steps: 32 - total_train_batch_size: 32 - optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08 - lr_scheduler_type: cosine - lr_scheduler_warmup_ratio: 0.05 - num_epochs: 1 ### Training results | Training Loss | Epoch | Step | Validation Loss | Rewards/chosen | Rewards/rejected | Rewards/accuracies | Rewards/margins | Logps/rejected | Logps/chosen | Logits/rejected | Logits/chosen | |:-------------:|:-----:|:----:|:---------------:|:--------------:|:----------------:|:------------------:|:---------------:|:--------------:|:------------:|:---------------:|:-------------:| | 0.5503 | 0.1 | 30 | 0.4684 | -0.0439 | -0.6295 | 0.8919 | 0.5856 | -837.9513 | -769.8103 | -0.9335 | -0.8894 | | 0.4178 | 0.2 | 60 | 0.3568 | -0.3713 | -1.4769 | 0.9015 | 1.1056 | -854.9000 | -776.3594 | -0.8768 | -0.8276 | | 0.3264 | 0.29 | 90 | 0.3143 | -0.4893 | -1.8730 | 0.9151 | 1.3837 | -862.8228 | -778.7191 | -0.8428 | -0.7929 | | 0.2999 | 0.39 | 120 | 0.2885 | -0.6832 | -2.3118 | 0.9151 | 1.6286 | -871.5981 | -782.5971 | -0.8260 | -0.7730 | | 0.3454 | 0.49 | 150 | 0.2749 | -0.7239 | -2.4904 | 0.9189 | 1.7664 | -875.1693 | -783.4113 | -0.8235 | -0.7678 | | 0.3354 | 0.59 | 180 | 0.2685 | -0.6775 | -2.4859 | 0.9170 | 1.8084 | -875.0795 | -782.4824 | -0.8130 | -0.7574 | | 0.2848 | 0.68 | 210 | 0.2652 | -0.7157 | -2.5692 | 0.9131 | 1.8535 | -876.7465 | -783.2466 | -0.8157 | -0.7586 | | 0.3437 | 0.78 | 240 | 0.2621 | -0.7233 | -2.6091 | 0.9151 | 1.8857 | -877.5430 | -783.3994 | -0.8138 | -0.7561 | | 0.2655 | 0.88 | 270 | 0.2611 | -0.7183 | -2.6154 | 0.9151 | 1.8971 | -877.6708 | -783.2995 | -0.8106 | -0.7524 | | 0.3442 | 0.98 | 300 | 0.2610 | -0.7248 | -2.6224 | 0.9170 | 1.8976 | -877.8102 | -783.4282 | -0.8110 | -0.7528 | ### Framework versions - PEFT 0.9.0 - Transformers 4.38.2 - Pytorch 2.2.1+cu121 - Datasets 2.17.1 - Tokenizers 0.15.2