On the Design of KL-Regularized Policy Gradient Algorithms for LLM Reasoning Paper • 2505.17508 • Published May 23 • 6
FormalMATH: Benchmarking Formal Mathematical Reasoning of Large Language Models Paper • 2505.02735 • Published May 5 • 33
MiniMax-01: Scaling Foundation Models with Lightning Attention Paper • 2501.08313 • Published Jan 14 • 297
Scaling Image Tokenizers with Grouped Spherical Quantization Paper • 2412.02632 • Published Dec 3, 2024 • 10
Training and Evaluating Language Models with Template-based Data Generation Paper • 2411.18104 • Published Nov 27, 2024 • 3
view article Article Revisiting TemplateGSM: Advancing Mathematical Reasoning in Language Models with Template-based Data Generation By yifAI • Nov 14, 2024 • 3
Self-Play Preference Optimization for Language Model Alignment Paper • 2405.00675 • Published May 1, 2024 • 27
General Preference Modeling with Preference Representations for Aligning Language Models Paper • 2410.02197 • Published Oct 3, 2024 • 9
Training Language Models to Self-Correct via Reinforcement Learning Paper • 2409.12917 • Published Sep 19, 2024 • 140
Seed-Music: A Unified Framework for High Quality and Controlled Music Generation Paper • 2409.09214 • Published Sep 13, 2024 • 53
Phi-3 Technical Report: A Highly Capable Language Model Locally on Your Phone Paper • 2404.14219 • Published Apr 22, 2024 • 256
The Era of 1-bit LLMs: All Large Language Models are in 1.58 Bits Paper • 2402.17764 • Published Feb 27, 2024 • 625