1 3 31

Zhenghao Xu

zhenghaoxu

AI & ML interests

None yet

Recent Activity

upvoted an article 3 months ago

Keep the Tokens Flowing: Lessons from 16 Open-Source RL Libraries

commentedon a paper 4 months ago

VESPO: Variational Sequence-Level Soft Policy Optimization for Stable Off-Policy LLM Training

updated a dataset 4 months ago

zhenghaoxu/aime-beyond

View all activity

Organizations

upvoted an article 3 months ago

Article

Keep the Tokens Flowing: Lessons from 16 Open-Source RL Libraries

aminediroHF, qgallouedec, kashif, lewtun, edbeeching, albertvillanova, nouamanetazi, lvwerra, sergiopaniego

•

Mar 10

• 160

commented a paper 4 months ago

VESPO: Variational Sequence-Level Soft Policy Optimization for Stable Off-Policy LLM Training

Paper • 2602.10693 • Published Feb 11 • 221 •

updated 2 datasets 4 months ago

zhenghaoxu/aime-beyond

Viewer • Updated Feb 22 • 100 • 19

zhenghaoxu/aime-amc23

Viewer • Updated Feb 22 • 40 • 33

published a dataset 4 months ago

zhenghaoxu/aime-amc23

Viewer • Updated Feb 22 • 40 • 33

updated 5 datasets 4 months ago

published 6 datasets 4 months ago

zhenghaoxu/dapo-math-17k

Viewer • Updated Feb 13 • 17.4k • 18

zhenghaoxu/aime-beyond

Viewer • Updated Feb 22 • 100 • 19

zhenghaoxu/aime-2026

Viewer • Updated Feb 22 • 30 • 73

zhenghaoxu/aime-2025

Viewer • Updated Feb 22 • 30 • 13

zhenghaoxu/aime-2024

Viewer • Updated Feb 22 • 30 • 28

zhenghaoxu/math-aime-eval

Viewer • Updated Feb 22 • 230 • 10

upvoted 2 papers 4 months ago

Approximation of Log-Partition Function in Policy Mirror Descent Induces Implicit Regularization for LLM Post-Training

Paper • 2602.05933 • Published Feb 5 • 6

Good SFT Optimizes for SFT, Better SFT Prepares for Reinforcement Learning

Paper • 2602.01058 • Published Feb 1 • 45

liked 2 models 6 months ago

inclusionAI/LLaDA2.0-flash

103B • Updated Dec 19, 2025 • 158 • 69

inclusionAI/LLaDA2.0-mini

Text Generation • 16B • Updated Apr 13 • 245k • 67

Zhenghao Xu

AI & ML interests

Recent Activity

Organizations

zhenghaoxu's activity

Keep the Tokens Flowing: Lessons from 16 Open-Source RL Libraries