R-PRM: Reasoning-Driven Process Reward Modeling
Shuaijie She
kevinpro
AI & ML interests
Reasoning, Chain of Thoughts, Alignment, Factual Consistency, Summarization
Recent Activity
Organizations
Collections
2
MAPO: Advancing Multilingual Reasoning through Multilingual Alignment‑as‑Preference
Optimization
-
3
Open Multilingual Reasoning Leaderboard
🦊Display and search a leaderboard of math models
-
MAPO: Advancing Multilingual Reasoning through Multilingual Alignment-as-Preference Optimization
Paper • 2401.06838 • Published -
kevinpro/MNumGLUESub
Updated • 11 -
kevinpro/MetaMathOctopus-MAPO-DPO-13B
Text Generation • Updated • 9
Papers
1
spaces
3
models
15

kevinpro/R-PRM-7B-DPO
Text Generation
•
Updated
•
14

kevinpro/Hydra-LLaMA3-8B-0531-preview-Q4_K_M-GGUF
Text Generation
•
Updated
•
2

kevinpro/MistralMathOctopus-7B
Text Generation
•
Updated
•
11

kevinpro/MetaMathOctopus-MAPO-DPO-13B
Text Generation
•
Updated
•
9

kevinpro/MathOctopus-MAPO-DPO-7B
Text Generation
•
Updated
•
11

kevinpro/MetaMathOctopus-13B
Text Generation
•
Updated
•
18

kevinpro/MetaMathOctopus-MAPO-DPO-7B
Text Generation
•
Updated
•
14

kevinpro/MetaMathOctopus-7B
Text Generation
•
Updated
•
10

kevinpro/MathOctopus-MAPO-DPO-13B
Text Generation
•
Updated
•
9

kevinpro/MistralMathOctopus-MAPO-DPO-7B
Text Generation
•
Updated