arxiv:2503.04625
							
						ChengpengLi
ChengpengLi
		AI & ML interests
LLM for Reasoning, reinforcement learning, recommendation system, diffusion models
		Recent Activity
						upvoted 
								a
								paper
							
						18 days ago
						
					
						
						
						Agentic Entropy-Balanced Policy Optimization
						
						upvoted 
								a
								paper
							
						about 1 month ago
						
					
						
						
						Quantile Advantage Estimation for Entropy-Safe Reasoning
						
						upvoted 
								a
								paper
							
						3 months ago
						
					
						
						
						We-Math 2.0: A Versatile MathBook System for Incentivizing Visual
  Mathematical Reasoning
						Organizations
None yet