--- language: - en tags: - webgpt - regression - reward-model license: apache-2.0 datasets: - openai/webgpt_comparisons - openai/summarize_from_feedback metrics: - accuracy --- Reward Model pretrained on openai/webgpt_comparison and humanfeedback summary. Unlike the other electra-large model this model is trained using rank loss with one more datasets. On validation dataset the result is much more stable than usual. You can refer to this [wandb](https://wandb.ai/theblackcat102/reward-model/runs/1d4e4oi2?workspace=) for more details Slightly better than previous webgpt only model : [electra-large](https://huggingface.co/theblackcat102/electra-large-webgpt-rm)