Reward Model pretrained on openai/webgpt_comparison and humanfeedback summary. Unlike the other electra-large model this model is trained using rank loss with one more datasets.
On validation dataset the result is much more stable than usual.
You can refer to this wandb for more details
Slightly better than previous webgpt only model : electra-large
- Downloads last month
- 12
Inference Providers
NEW
This model is not currently available via any of the supported third-party Inference Providers, and
the model is not deployed on the HF Inference API.