RL-PLUS: Countering Capability Boundary Collapse of LLMs in Reinforcement Learning with Hybrid-policy Optimization Paper • 2508.00222 • Published Jul 31 • 6
RL-PLUS: Countering Capability Boundary Collapse of LLMs in Reinforcement Learning with Hybrid-policy Optimization Paper • 2508.00222 • Published Jul 31 • 6 • 2
GLM-4.1V-Thinking: Towards Versatile Multimodal Reasoning with Scalable Reinforcement Learning Paper • 2507.01006 • Published Jul 1 • 236
Skywork-Reward-V2: Scaling Preference Data Curation via Human-AI Synergy Paper • 2507.01352 • Published Jul 2 • 54
MultiFinBen: A Multilingual, Multimodal, and Difficulty-Aware Benchmark for Financial LLM Evaluation Paper • 2506.14028 • Published Jun 16 • 93
CMI-Bench: A Comprehensive Benchmark for Evaluating Music Instruction Following Paper • 2506.12285 • Published Jun 14 • 53
MiniMax-M1: Scaling Test-Time Compute Efficiently with Lightning Attention Paper • 2506.13585 • Published Jun 16 • 266