AlphaMaze: Enhancing Large Language Models' Spatial Intelligence via GRPO Paper • 2502.14669 • Published Feb 20 • 14
R1-Searcher: Incentivizing the Search Capability in LLMs via Reinforcement Learning Paper • 2503.05592 • Published Mar 7 • 27
Offline Reinforcement Learning for LLM Multi-Step Reasoning Paper • 2412.16145 • Published Dec 20, 2024 • 39
OpenVLThinker: An Early Exploration to Complex Vision-Language Reasoning via Iterative Self-Improvement Paper • 2503.17352 • Published Mar 21 • 23
SFT or RL? An Early Investigation into Training R1-Like Reasoning Large Vision-Language Models Paper • 2504.11468 • Published 27 days ago • 28