Update README.md
Browse files
README.md
CHANGED
@@ -15,7 +15,7 @@ library_name: transformers
|
|
15 |
|
16 |
<!-- Provide a quick summary of what the model is/does. -->
|
17 |
|
18 |
-
{{MODEL_NAME_HERE}} is one of
|
19 |
We have released a large set of 70 total reward model checkpoints that we used to develop the benchmark and correlate it with downstream PPO / Best-of-N performance.
|
20 |
|
21 |
[Models](https://huggingface.co/collections/allenai/reward-bench-2-683d2612a4b3e38a3e53bb51) | [Code](https://github.com/allenai/reward-bench) | [Eval. Dataset v2](https://huggingface.co/datasets/allenai/reward-bench-2) | [Results v2](https://huggingface.co/datasets/allenai/reward-bench-2-results) | [Paper](https://github.com/allenai/reward-bench/blob/main/paper-v2.pdf)
|
@@ -24,7 +24,7 @@ We have released a large set of 70 total reward model checkpoints that we used t
|
|
24 |
## Model Details
|
25 |
|
26 |
The model is a standard classifier, `AutoModelForSequenceClassification` within the HuggingFace ecosystem, trained on binary preference data.
|
27 |
-
For each model in this batch the main revision is the best model we obtained for that base model, and we include all other training data and
|
28 |
|
29 |
To load a model from a revision, modify the following:
|
30 |
|
|
|
15 |
|
16 |
<!-- Provide a quick summary of what the model is/does. -->
|
17 |
|
18 |
+
{{MODEL_NAME_HERE}} is one of 7 sets of reward models (RMs) released with Reward Bench 2.
|
19 |
We have released a large set of 70 total reward model checkpoints that we used to develop the benchmark and correlate it with downstream PPO / Best-of-N performance.
|
20 |
|
21 |
[Models](https://huggingface.co/collections/allenai/reward-bench-2-683d2612a4b3e38a3e53bb51) | [Code](https://github.com/allenai/reward-bench) | [Eval. Dataset v2](https://huggingface.co/datasets/allenai/reward-bench-2) | [Results v2](https://huggingface.co/datasets/allenai/reward-bench-2-results) | [Paper](https://github.com/allenai/reward-bench/blob/main/paper-v2.pdf)
|
|
|
24 |
## Model Details
|
25 |
|
26 |
The model is a standard classifier, `AutoModelForSequenceClassification` within the HuggingFace ecosystem, trained on binary preference data.
|
27 |
+
For each model in this batch the main revision is the best model we obtained for that base model, and we include all other training data and hyperparameter combinations in the revisions for further research.
|
28 |
|
29 |
To load a model from a revision, modify the following:
|
30 |
|