File size: 687 Bytes
c9b42a8 0ce3442 c135a1c 0ce3442 c9b42a8 7d67d6f 0bca33d |
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 |
---
language:
- en
tags:
- webgpt
- regression
- reward-model
license: apache-2.0
datasets:
- openai/webgpt_comparisons
- openai/summarize_from_feedback
metrics:
- accuracy
---
Reward Model pretrained on openai/webgpt_comparison and humanfeedback summary. Unlike the other electra-large model this model is trained using rank loss with one more datasets.
On validation dataset the result is much more stable than usual.
You can refer to this [wandb](https://wandb.ai/theblackcat102/reward-model/runs/1d4e4oi2?workspace=) for more details
Slightly better than previous webgpt only model : [electra-large](https://huggingface.co/theblackcat102/electra-large-webgpt-rm)
|