VerIPO: Cultivating Long Reasoning in Video-LLMs via Verifier-Gudied Iterative Policy Optimization Paper • 2505.19000 • Published 13 days ago • 42
AdaCtrl: Towards Adaptive and Controllable Reasoning via Difficulty-Aware Budgeting Paper • 2505.18822 • Published 13 days ago • 14
QwenLong-L1: Towards Long-Context Large Reasoning Models with Reinforcement Learning Paper • 2505.17667 • Published 15 days ago • 85
AdaCoT: Pareto-Optimal Adaptive Chain-of-Thought Triggering via Reinforcement Learning Paper • 2505.11896 • Published 21 days ago • 57
Perception, Reason, Think, and Plan: A Survey on Large Multimodal Reasoning Models Paper • 2505.04921 • Published 30 days ago • 176
ReTool: Reinforcement Learning for Strategic Tool Use in LLMs Paper • 2504.11536 • Published Apr 15 • 61
FuseChat-3.0: Preference Optimization Meets Heterogeneous Model Fusion Paper • 2503.04222 • Published Mar 6 • 15
VLM^2-Bench: A Closer Look at How Well VLMs Implicitly Link Explicit Matching Visual Cues Paper • 2502.12084 • Published Feb 17 • 30