update readme

Files changed (3) hide show

.gitattributes +1 -0
README.md +2 -2
images/PRM_BON.png +3 -0

.gitattributes CHANGED Viewed

@@ -37,3 +37,4 @@ Benchmark_PRM_perf.png filter=lfs diff=lfs merge=lfs -text
 PRM_BoN_rows.png filter=lfs diff=lfs merge=lfs -text
 images/Benchmark_PRM_perf.png filter=lfs diff=lfs merge=lfs -text
 images/PRM_BoN_rows.png filter=lfs diff=lfs merge=lfs -text

 PRM_BoN_rows.png filter=lfs diff=lfs merge=lfs -text
 images/Benchmark_PRM_perf.png filter=lfs diff=lfs merge=lfs -text
 images/PRM_BoN_rows.png filter=lfs diff=lfs merge=lfs -text
+images/PRM_BON.png filter=lfs diff=lfs merge=lfs -text

README.md CHANGED Viewed

@@ -41,7 +41,7 @@ Before obtaining a response, the model expects the user generated prompt `"Is th
 We show the performance of MATH-500 with inference scaling on a variety of LLM generators, including [Granite-3.3-8B-Instruct](https://huggingface.co/ibm-granite/granite-3.3-8b-instruct), [Phi-4](https://huggingface.co/microsoft/phi-4), and [Qwen-2.5-Math-7B-Instruct](https://huggingface.co/Qwen/Qwen2.5-Math-7B-Instruct), and show strong gains over Majority Voting with both Best-of-N and Weighted Majority Voting using Granite-3.3-8B-LoRA-Math-PRM.
-<img src="images/PRM_BoN_rows.png" alt="PRM Performance on Math-500" width="5000"/>
 We also compare the Best-of-N performance on Math-500 available PRMs on [Qwen-2.5-Math-7B-Instruct](https://huggingface.co/Qwen/Qwen2.5-Math-7B-Instruct) generations, and show the strong performance of  Granite-3.3-8B-LoRA-Math-PRM over majority voting:
@@ -79,7 +79,7 @@ As shown above, Granite-3.3-8B-LoRA-Math-PRM shows strong performance on both Pr
 **Training Data**
-For training the Math PRM adapter, we curate training data from a diverse set of model responses to prompts from Math-specific datasets, specifically, [MetaMathQA](https://huggingface.co/datasets/meta-math/MetaMathQA), [MathInstruct](https://huggingface.co/datasets/TIGER-Lab/MathInstruct) and [NuminaMath](huggingface.co/datasets/AI-MO/NuminaMath-CoT). We leverage a diverse set of LLMs from the Granite Language Model Family, Phi-4, and Mixtral 8x22B to generate outputs, and use the Automatic Process Supervision method as described in [Luo et. al, 2024](https://arxiv.org/abs/2406.06592) for detecting steps with erros.
 **Usage**

 We show the performance of MATH-500 with inference scaling on a variety of LLM generators, including [Granite-3.3-8B-Instruct](https://huggingface.co/ibm-granite/granite-3.3-8b-instruct), [Phi-4](https://huggingface.co/microsoft/phi-4), and [Qwen-2.5-Math-7B-Instruct](https://huggingface.co/Qwen/Qwen2.5-Math-7B-Instruct), and show strong gains over Majority Voting with both Best-of-N and Weighted Majority Voting using Granite-3.3-8B-LoRA-Math-PRM.
+<img src="images/PRM_BON.png" alt="PRM Performance on Math-500" height="800"/>
 We also compare the Best-of-N performance on Math-500 available PRMs on [Qwen-2.5-Math-7B-Instruct](https://huggingface.co/Qwen/Qwen2.5-Math-7B-Instruct) generations, and show the strong performance of  Granite-3.3-8B-LoRA-Math-PRM over majority voting:
 **Training Data**
+For training the Math PRM adapter, we curate training data from a diverse set of model responses to prompts from Math-specific datasets, specifically, [MetaMathQA](https://huggingface.co/datasets/meta-math/MetaMathQA), [MathInstruct](https://huggingface.co/datasets/TIGER-Lab/MathInstruct) and [NuminaMath](https://huggingface.co/datasets/AI-MO/NuminaMath-CoT). We leverage a diverse set of LLMs from the Granite Language Model Family, Phi-4, and Mixtral 8x22B to generate outputs, and use the Automatic Process Supervision method as described in [Luo et. al, 2024](https://arxiv.org/abs/2406.06592) for detecting steps with erros.
 **Usage**

images/PRM_BON.png ADDED Viewed

Git LFS Details

SHA256: 509931e746d99d24338cd9802be510b83d8edc2102e6ac9b13064fa503bc0ea9
Pointer size: 131 Bytes
Size of remote file: 582 kB