Statistical Rejection Sampling Improves Preference Optimization Paper • 2309.06657 • Published Sep 13, 2023 • 14
Megalodon: Efficient LLM Pretraining and Inference with Unlimited Context Length Paper • 2404.08801 • Published Apr 12, 2024 • 68
ZeroSearch: Incentivize the Search Capability of LLMs without Searching Paper • 2505.04588 • Published May 7 • 64
Absolute Zero: Reinforced Self-play Reasoning with Zero Data Paper • 2505.03335 • Published May 6 • 169