This model is fine-tuned from the glorgao/Qwen2.5-7B-SFT model using the SelectiveDPO algorithm on the Ultrafeedback_binarized dataset.

For the recipe to reproduce this model, please visit our GitHub page.

Downloads last month
31
Safetensors
Model size
7.07B params
Tensor type
F32
·
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for glorgao/SelectiveDPO-Qwen2.5-7B-SFT-UFBinarized

Finetuned
(1)
this model

Dataset used to train glorgao/SelectiveDPO-Qwen2.5-7B-SFT-UFBinarized

Collection including glorgao/SelectiveDPO-Qwen2.5-7B-SFT-UFBinarized