Reward Model pretrained on openai/webgpt_comparison and humanfeedback summary. Unlike the other electra-large model this model is trained using rank loss with one more datasets.

On validation dataset the result is much more stable than usual.

You can refer to this wandb for more details

Slightly better than previous webgpt only model : electra-large

Downloads last month: 20

Inference Providers NEW

Text Classification

This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

theblackcat102
/

electra-large-reward-model

Datasets used to train theblackcat102/electra-large-reward-model