10 9

Zhenru Zhang

Zhenru

AI & ML interests

None yet

Recent Activity

authored a paper 26 days ago

Beyond the 80/20 Rule: High-Entropy Minority Tokens Drive Effective Reinforcement Learning for LLM Reasoning

upvoted a paper 26 days ago

Beyond the 80/20 Rule: High-Entropy Minority Tokens Drive Effective Reinforcement Learning for LLM Reasoning

upvoted a paper about 1 month ago

WorldPM: Scaling Human Preference Modeling

View all activity

Organizations

authored a paper 26 days ago

Beyond the 80/20 Rule: High-Entropy Minority Tokens Drive Effective Reinforcement Learning for LLM Reasoning

Paper • 2506.01939 • Published 27 days ago • 164

upvoted a paper 26 days ago

Beyond the 80/20 Rule: High-Entropy Minority Tokens Drive Effective Reinforcement Learning for LLM Reasoning

Paper • 2506.01939 • Published 27 days ago • 164

upvoted 2 papers about 1 month ago

WorldPM: Scaling Human Preference Modeling

Paper • 2505.10527 • Published May 15 • 33

Qwen3 Technical Report

Paper • 2505.09388 • Published May 14 • 205

updated a model about 1 month ago

Qwen/WorldPM-72B-UltraFeedback

Text Classification • 73B • Updated May 17 • 157k • 4

authored a paper about 1 month ago

WorldPM: Scaling Human Preference Modeling

Paper • 2505.10527 • Published May 15 • 33

authored a paper 4 months ago

START: Self-taught Reasoner with Tools

Paper • 2503.04625 • Published Mar 6 • 114

upvoted a paper 4 months ago

START: Self-taught Reasoner with Tools

Paper • 2503.04625 • Published Mar 6 • 114

upvoted a paper 5 months ago

Demons in the Detail: On Implementing Load Balancing Loss for Training Specialized Mixture-of-Expert Models

Paper • 2501.11873 • Published Jan 21 • 66

updated 2 models 5 months ago

Qwen/Qwen2.5-Math-7B-PRM800K

Text Classification • 8B • Updated Jan 17 • 2.19k • 17

Qwen/Qwen2.5-Math-PRM-72B

Text Classification • 73B • Updated Jan 17 • 1.87k • 73

New activity in Qwen/Qwen2.5-Math-PRM-7B 5 months ago

Fix backslashes in prompt example

#5 opened 5 months ago by

sorokin

"<extra_0>" is not special token ? I got 5 token_ids ，is it right？

#4 opened 5 months ago by

ShelterW

commented a paper 6 months ago

The Lessons of Developing Process Reward Models in Mathematical Reasoning

Paper • 2501.07301 • Published Jan 13 • 99 •

updated a model 6 months ago

Qwen/Qwen2.5-Math-PRM-7B

Text Classification • 8B • Updated Jan 17 • 21.4k • 71

New activity in Qwen/Qwen2.5-Math-PRM-7B 6 months ago

Could you clarify whether the PRM800K deduplication was performed using the original 5000-test set from MATH or the MATH500 dataset?

#2 opened 6 months ago by

masterLan

question about the step separato "\n\n"

#3 opened 6 months ago by

pixas

authored a paper 6 months ago

The Lessons of Developing Process Reward Models in Mathematical Reasoning

Paper • 2501.07301 • Published Jan 13 • 99

commented a paper 6 months ago

The Lessons of Developing Process Reward Models in Mathematical Reasoning

Paper • 2501.07301 • Published Jan 13 • 99 •

upvoted a paper 6 months ago

The Lessons of Developing Process Reward Models in Mathematical Reasoning

Paper • 2501.07301 • Published Jan 13 • 99

Zhenru Zhang

AI & ML interests

Recent Activity

Organizations

Zhenru's activity

Fix backslashes in prompt example

"<extra_0>" is not special token ? I got 5 token_ids ，is it right？

Could you clarify whether the PRM800K deduplication was performed using the original 5000-test set from MATH or the MATH500 dataset?

question about the step separato "\n\n"