On the Design of KL-Regularized Policy Gradient Algorithms for LLM Reasoning Paper • 2505.17508 • Published 15 days ago • 5
On the Design of KL-Regularized Policy Gradient Algorithms for LLM Reasoning Paper • 2505.17508 • Published 15 days ago • 5