allenai
/

Llama-3.1-70B-Instruct-RM-RB2

Text Classification

text-generation-inference

Model card Files Files and versions

saumyamalik commited on 6 days ago

Commit

664f2a0

·

verified ·

1 Parent(s): c3bf353

Update README.md

Files changed (1) hide show

README.md +2 -2

README.md CHANGED Viewed

@@ -15,7 +15,7 @@ library_name: transformers
 <!-- Provide a quick summary of what the model is/does. -->
-{{MODEL_NAME_HERE}} is one of 6 sets of reward models (RMs) released with Reward Bench 2.
 We have released a large set of 70 total reward model checkpoints that we used to develop the benchmark and correlate it with downstream PPO / Best-of-N performance.
 [Models](https://huggingface.co/collections/allenai/reward-bench-2-683d2612a4b3e38a3e53bb51) | [Code](https://github.com/allenai/reward-bench) |  [Eval. Dataset v2](https://huggingface.co/datasets/allenai/reward-bench-2) | [Results v2](https://huggingface.co/datasets/allenai/reward-bench-2-results) | [Paper](https://github.com/allenai/reward-bench/blob/main/paper-v2.pdf)
@@ -24,7 +24,7 @@ We have released a large set of 70 total reward model checkpoints that we used t
 ## Model Details
 The model is a standard classifier, `AutoModelForSequenceClassification` within the HuggingFace ecosystem, trained on binary preference data.
-For each model in this batch the main revision is the best model we obtained for that base model, and we include all other training data and hyperparamter combinations in the revisions for further research.
 To load a model from a revision, modify the following:

 <!-- Provide a quick summary of what the model is/does. -->
+{{MODEL_NAME_HERE}} is one of 7 sets of reward models (RMs) released with Reward Bench 2.
 We have released a large set of 70 total reward model checkpoints that we used to develop the benchmark and correlate it with downstream PPO / Best-of-N performance.
 [Models](https://huggingface.co/collections/allenai/reward-bench-2-683d2612a4b3e38a3e53bb51) | [Code](https://github.com/allenai/reward-bench) |  [Eval. Dataset v2](https://huggingface.co/datasets/allenai/reward-bench-2) | [Results v2](https://huggingface.co/datasets/allenai/reward-bench-2-results) | [Paper](https://github.com/allenai/reward-bench/blob/main/paper-v2.pdf)
 ## Model Details
 The model is a standard classifier, `AutoModelForSequenceClassification` within the HuggingFace ecosystem, trained on binary preference data.
+For each model in this batch the main revision is the best model we obtained for that base model, and we include all other training data and hyperparameter combinations in the revisions for further research.
 To load a model from a revision, modify the following: