File size: 687 Bytes
c9b42a8
0ce3442
 
 
 
 
 
 
 
 
c135a1c
0ce3442
 
c9b42a8
7d67d6f
 
 
 
 
 
 
0bca33d
 
 
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
---
language:
  - en
tags:
  - webgpt
  - regression
  - reward-model
license: apache-2.0
datasets:
  - openai/webgpt_comparisons
  - openai/summarize_from_feedback
metrics:
  - accuracy
---

Reward Model pretrained on openai/webgpt_comparison and humanfeedback summary. Unlike the other electra-large model this model is trained using rank loss with one more datasets.

On validation dataset the result is much more stable than usual.

You can refer to this [wandb](https://wandb.ai/theblackcat102/reward-model/runs/1d4e4oi2?workspace=) for more details


Slightly better than previous webgpt only model : [electra-large](https://huggingface.co/theblackcat102/electra-large-webgpt-rm)