Beyond the 80/20 Rule: High-Entropy Minority Tokens Drive Effective Reinforcement Learning for LLM Reasoning Paper • 2506.01939 • Published 9 days ago • 146
Absolute Zero: Reinforced Self-play Reasoning with Zero Data Paper • 2505.03335 • Published May 6 • 170
COIG-P: A High-Quality and Large-Scale Chinese Preference Dataset for Alignment with Human Values Paper • 2504.05535 • Published Apr 7 • 44
PopAlign: Diversifying Contrasting Patterns for a More Comprehensive Alignment Paper • 2410.13785 • Published Oct 17, 2024 • 19