aashka-trivedi
commited on
Commit
·
2eb2719
1
Parent(s):
d5110ba
update readme
Browse files- .gitattributes +1 -0
- README.md +2 -2
- images/PRM_BON.png +3 -0
.gitattributes
CHANGED
@@ -37,3 +37,4 @@ Benchmark_PRM_perf.png filter=lfs diff=lfs merge=lfs -text
|
|
37 |
PRM_BoN_rows.png filter=lfs diff=lfs merge=lfs -text
|
38 |
images/Benchmark_PRM_perf.png filter=lfs diff=lfs merge=lfs -text
|
39 |
images/PRM_BoN_rows.png filter=lfs diff=lfs merge=lfs -text
|
|
|
|
37 |
PRM_BoN_rows.png filter=lfs diff=lfs merge=lfs -text
|
38 |
images/Benchmark_PRM_perf.png filter=lfs diff=lfs merge=lfs -text
|
39 |
images/PRM_BoN_rows.png filter=lfs diff=lfs merge=lfs -text
|
40 |
+
images/PRM_BON.png filter=lfs diff=lfs merge=lfs -text
|
README.md
CHANGED
@@ -41,7 +41,7 @@ Before obtaining a response, the model expects the user generated prompt `"Is th
|
|
41 |
We show the performance of MATH-500 with inference scaling on a variety of LLM generators, including [Granite-3.3-8B-Instruct](https://huggingface.co/ibm-granite/granite-3.3-8b-instruct), [Phi-4](https://huggingface.co/microsoft/phi-4), and [Qwen-2.5-Math-7B-Instruct](https://huggingface.co/Qwen/Qwen2.5-Math-7B-Instruct), and show strong gains over Majority Voting with both Best-of-N and Weighted Majority Voting using Granite-3.3-8B-LoRA-Math-PRM.
|
42 |
|
43 |
|
44 |
-
<img src="images/
|
45 |
|
46 |
|
47 |
We also compare the Best-of-N performance on Math-500 available PRMs on [Qwen-2.5-Math-7B-Instruct](https://huggingface.co/Qwen/Qwen2.5-Math-7B-Instruct) generations, and show the strong performance of Granite-3.3-8B-LoRA-Math-PRM over majority voting:
|
@@ -79,7 +79,7 @@ As shown above, Granite-3.3-8B-LoRA-Math-PRM shows strong performance on both Pr
|
|
79 |
|
80 |
**Training Data**
|
81 |
|
82 |
-
For training the Math PRM adapter, we curate training data from a diverse set of model responses to prompts from Math-specific datasets, specifically, [MetaMathQA](https://huggingface.co/datasets/meta-math/MetaMathQA), [MathInstruct](https://huggingface.co/datasets/TIGER-Lab/MathInstruct) and [NuminaMath](huggingface.co/datasets/AI-MO/NuminaMath-CoT). We leverage a diverse set of LLMs from the Granite Language Model Family, Phi-4, and Mixtral 8x22B to generate outputs, and use the Automatic Process Supervision method as described in [Luo et. al, 2024](https://arxiv.org/abs/2406.06592) for detecting steps with erros.
|
83 |
|
84 |
**Usage**
|
85 |
|
|
|
41 |
We show the performance of MATH-500 with inference scaling on a variety of LLM generators, including [Granite-3.3-8B-Instruct](https://huggingface.co/ibm-granite/granite-3.3-8b-instruct), [Phi-4](https://huggingface.co/microsoft/phi-4), and [Qwen-2.5-Math-7B-Instruct](https://huggingface.co/Qwen/Qwen2.5-Math-7B-Instruct), and show strong gains over Majority Voting with both Best-of-N and Weighted Majority Voting using Granite-3.3-8B-LoRA-Math-PRM.
|
42 |
|
43 |
|
44 |
+
<img src="images/PRM_BON.png" alt="PRM Performance on Math-500" height="800"/>
|
45 |
|
46 |
|
47 |
We also compare the Best-of-N performance on Math-500 available PRMs on [Qwen-2.5-Math-7B-Instruct](https://huggingface.co/Qwen/Qwen2.5-Math-7B-Instruct) generations, and show the strong performance of Granite-3.3-8B-LoRA-Math-PRM over majority voting:
|
|
|
79 |
|
80 |
**Training Data**
|
81 |
|
82 |
+
For training the Math PRM adapter, we curate training data from a diverse set of model responses to prompts from Math-specific datasets, specifically, [MetaMathQA](https://huggingface.co/datasets/meta-math/MetaMathQA), [MathInstruct](https://huggingface.co/datasets/TIGER-Lab/MathInstruct) and [NuminaMath](https://huggingface.co/datasets/AI-MO/NuminaMath-CoT). We leverage a diverse set of LLMs from the Granite Language Model Family, Phi-4, and Mixtral 8x22B to generate outputs, and use the Automatic Process Supervision method as described in [Luo et. al, 2024](https://arxiv.org/abs/2406.06592) for detecting steps with erros.
|
83 |
|
84 |
**Usage**
|
85 |
|
images/PRM_BON.png
ADDED
![]() |
Git LFS Details
|