Safetensors
gemma

This models uses OpenRLHF Codebase for the average loss with the method Regularized-Preference-Optimization . The SFT loss coefficient is 0.2. The relevant paper is (https://arxiv.org/abs/2405.16436).

Downloads last month
25
Safetensors
Model size
8.54B params
Tensor type
BF16
·
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Dataset used to train ZHLiu627/zephyr-7b-gemma-rpo-avg