MiroMind-M1: An Open-Source Advancement in Mathematical Reasoning via Context-Aware Multi-Stage Policy Optimization Paper • 2507.14683 • Published Jul 19 • 126
SuperWriter: Reflection-Driven Long-Form Generation with Large Language Models Paper • 2506.04180 • Published Jun 4 • 33
Finding the Sweet Spot: Preference Data Construction for Scaling Preference Optimization Paper • 2502.16825 • Published Feb 24 • 7