allenai
/

Llama-3.1-8B-Instruct-RM-RB2

Text Classification

text-generation-inference

Model card Files Files and versions

saumyamalik commited on 5 days ago

Commit

07ef68c

·

verified ·

1 Parent(s): 670d183

Update README.md

Files changed (1) hide show

README.md +1 -1

README.md CHANGED Viewed

@@ -18,7 +18,7 @@ library_name: transformers
 Llama-3.1-8B-Instruct-RM-RB2 is one of 7 sets of reward models (RMs) released with Reward Bench 2.
 We have released a large set of 70 total reward model checkpoints that we used to develop the benchmark and correlate it with downstream PPO / Best-of-N performance.
-[Models](https://huggingface.co/collections/allenai/reward-bench-2-683d2612a4b3e38a3e53bb51) | [Code](https://github.com/allenai/reward-bench) |  [Eval. Dataset v2](https://huggingface.co/datasets/allenai/reward-bench-2) | [Results v2](https://huggingface.co/datasets/allenai/reward-bench-2-results) | [Paper](https://github.com/allenai/reward-bench/blob/main/paper-v2.pdf)
 ## Model Details

 Llama-3.1-8B-Instruct-RM-RB2 is one of 7 sets of reward models (RMs) released with Reward Bench 2.
 We have released a large set of 70 total reward model checkpoints that we used to develop the benchmark and correlate it with downstream PPO / Best-of-N performance.
+[Models](https://huggingface.co/collections/allenai/reward-bench-2-683d2612a4b3e38a3e53bb51) | [Code](https://github.com/allenai/reward-bench) |  [Eval. Dataset v2](https://huggingface.co/datasets/allenai/reward-bench-2) | [Results v2](https://huggingface.co/datasets/allenai/reward-bench-2-results) | [Paper](https://arxiv.org/abs/2506.01937)
 ## Model Details