Text Generation
Transformers
Safetensors
English
reward model
conversational
aashka-trivedi commited on
Commit
2eb2719
·
1 Parent(s): d5110ba

update readme

Browse files
Files changed (3) hide show
  1. .gitattributes +1 -0
  2. README.md +2 -2
  3. images/PRM_BON.png +3 -0
.gitattributes CHANGED
@@ -37,3 +37,4 @@ Benchmark_PRM_perf.png filter=lfs diff=lfs merge=lfs -text
37
  PRM_BoN_rows.png filter=lfs diff=lfs merge=lfs -text
38
  images/Benchmark_PRM_perf.png filter=lfs diff=lfs merge=lfs -text
39
  images/PRM_BoN_rows.png filter=lfs diff=lfs merge=lfs -text
 
 
37
  PRM_BoN_rows.png filter=lfs diff=lfs merge=lfs -text
38
  images/Benchmark_PRM_perf.png filter=lfs diff=lfs merge=lfs -text
39
  images/PRM_BoN_rows.png filter=lfs diff=lfs merge=lfs -text
40
+ images/PRM_BON.png filter=lfs diff=lfs merge=lfs -text
README.md CHANGED
@@ -41,7 +41,7 @@ Before obtaining a response, the model expects the user generated prompt `"Is th
41
  We show the performance of MATH-500 with inference scaling on a variety of LLM generators, including [Granite-3.3-8B-Instruct](https://huggingface.co/ibm-granite/granite-3.3-8b-instruct), [Phi-4](https://huggingface.co/microsoft/phi-4), and [Qwen-2.5-Math-7B-Instruct](https://huggingface.co/Qwen/Qwen2.5-Math-7B-Instruct), and show strong gains over Majority Voting with both Best-of-N and Weighted Majority Voting using Granite-3.3-8B-LoRA-Math-PRM.
42
 
43
 
44
- <img src="images/PRM_BoN_rows.png" alt="PRM Performance on Math-500" width="5000"/>
45
 
46
 
47
  We also compare the Best-of-N performance on Math-500 available PRMs on [Qwen-2.5-Math-7B-Instruct](https://huggingface.co/Qwen/Qwen2.5-Math-7B-Instruct) generations, and show the strong performance of Granite-3.3-8B-LoRA-Math-PRM over majority voting:
@@ -79,7 +79,7 @@ As shown above, Granite-3.3-8B-LoRA-Math-PRM shows strong performance on both Pr
79
 
80
  **Training Data**
81
 
82
- For training the Math PRM adapter, we curate training data from a diverse set of model responses to prompts from Math-specific datasets, specifically, [MetaMathQA](https://huggingface.co/datasets/meta-math/MetaMathQA), [MathInstruct](https://huggingface.co/datasets/TIGER-Lab/MathInstruct) and [NuminaMath](huggingface.co/datasets/AI-MO/NuminaMath-CoT). We leverage a diverse set of LLMs from the Granite Language Model Family, Phi-4, and Mixtral 8x22B to generate outputs, and use the Automatic Process Supervision method as described in [Luo et. al, 2024](https://arxiv.org/abs/2406.06592) for detecting steps with erros.
83
 
84
  **Usage**
85
 
 
41
  We show the performance of MATH-500 with inference scaling on a variety of LLM generators, including [Granite-3.3-8B-Instruct](https://huggingface.co/ibm-granite/granite-3.3-8b-instruct), [Phi-4](https://huggingface.co/microsoft/phi-4), and [Qwen-2.5-Math-7B-Instruct](https://huggingface.co/Qwen/Qwen2.5-Math-7B-Instruct), and show strong gains over Majority Voting with both Best-of-N and Weighted Majority Voting using Granite-3.3-8B-LoRA-Math-PRM.
42
 
43
 
44
+ <img src="images/PRM_BON.png" alt="PRM Performance on Math-500" height="800"/>
45
 
46
 
47
  We also compare the Best-of-N performance on Math-500 available PRMs on [Qwen-2.5-Math-7B-Instruct](https://huggingface.co/Qwen/Qwen2.5-Math-7B-Instruct) generations, and show the strong performance of Granite-3.3-8B-LoRA-Math-PRM over majority voting:
 
79
 
80
  **Training Data**
81
 
82
+ For training the Math PRM adapter, we curate training data from a diverse set of model responses to prompts from Math-specific datasets, specifically, [MetaMathQA](https://huggingface.co/datasets/meta-math/MetaMathQA), [MathInstruct](https://huggingface.co/datasets/TIGER-Lab/MathInstruct) and [NuminaMath](https://huggingface.co/datasets/AI-MO/NuminaMath-CoT). We leverage a diverse set of LLMs from the Granite Language Model Family, Phi-4, and Mixtral 8x22B to generate outputs, and use the Automatic Process Supervision method as described in [Luo et. al, 2024](https://arxiv.org/abs/2406.06592) for detecting steps with erros.
83
 
84
  **Usage**
85
 
images/PRM_BON.png ADDED

Git LFS Details

  • SHA256: 509931e746d99d24338cd9802be510b83d8edc2102e6ac9b13064fa503bc0ea9
  • Pointer size: 131 Bytes
  • Size of remote file: 582 kB