This models uses OpenRLHF Codebase for the average loss with the method Regularized-Preference-Optimization
. The SFT loss coefficient is 0.2
. The relevant paper is (https://arxiv.org/abs/2405.16436).
- Downloads last month
- 25
Inference Providers
NEW
This model isn't deployed by any Inference Provider.
🙋
Ask for provider support
HF Inference deployability: The model has no library tag.