---
language:
  - en
tags:
  - webgpt
  - regression
  - reward-model
license: apache-2.0
datasets:
  - openai/webgpt_comparisons
  - openai/summarize_from_feedback
metrics:
  - accuracy
---

Reward Model pretrained on openai/webgpt_comparison and humanfeedback summary. Unlike the other electra-large model this model is trained using rank loss with one more datasets.

On validation dataset the result is much more stable than usual.

You can refer to this [wandb](https://wandb.ai/theblackcat102/reward-model/runs/1d4e4oi2?workspace=) for more details


Slightly better than previous webgpt only model : [electra-large](https://huggingface.co/theblackcat102/electra-large-webgpt-rm)