Search-R1: Training LLMs to Reason and Leverage Search Engines with Reinforcement Learning Paper • 2503.09516 • Published Mar 12 • 32
AlphaOne: Reasoning Models Thinking Slow and Fast at Test Time Paper • 2505.24863 • Published 30 days ago • 94
QwenLong-L1: Towards Long-Context Large Reasoning Models with Reinforcement Learning Paper • 2505.17667 • Published May 23 • 87
ProRL: Prolonged Reinforcement Learning Expands Reasoning Boundaries in Large Language Models Paper • 2505.24864 • Published 30 days ago • 131
AReaL: A Large-Scale Asynchronous Reinforcement Learning System for Language Reasoning Paper • 2505.24298 • Published 30 days ago • 25
GraLoRA: Granular Low-Rank Adaptation for Parameter-Efficient Fine-Tuning Paper • 2505.20355 • Published May 26 • 36
Interleaved Reasoning for Large Language Models via Reinforcement Learning Paper • 2505.19640 • Published May 26 • 13
FullFront: Benchmarking MLLMs Across the Full Front-End Engineering Workflow Paper • 2505.17399 • Published May 23 • 14
Enigmata: Scaling Logical Reasoning in Large Language Models with Synthetic Verifiable Puzzles Paper • 2505.19914 • Published May 26 • 42
One RL to See Them All: Visual Triple Unified Reinforcement Learning Paper • 2505.18129 • Published May 23 • 59
Scaling Reasoning, Losing Control: Evaluating Instruction Following in Large Reasoning Models Paper • 2505.14810 • Published May 20 • 61
Tool-Star: Empowering LLM-Brained Multi-Tool Reasoner via Reinforcement Learning Paper • 2505.16410 • Published May 22 • 56
Optimizing Anytime Reasoning via Budget Relative Policy Optimization Paper • 2505.13438 • Published May 19 • 35
CPGD: Toward Stable Rule-based Reinforcement Learning for Language Models Paper • 2505.12504 • Published May 18 • 23
Ψ-Sampler: Initial Particle Sampling for SMC-Based Inference-Time Reward Alignment in Score Models Paper • 2506.01320 • Published 27 days ago • 16
Robot-R1: Reinforcement Learning for Enhanced Embodied Reasoning in Robotics Paper • 2506.00070 • Published about 1 month ago • 28
A Controllable Examination for Long-Context Language Models Paper • 2506.02921 • Published 26 days ago • 32
MotionSight: Boosting Fine-Grained Motion Understanding in Multimodal LLMs Paper • 2506.01674 • Published 27 days ago • 27
CodeContests+: High-Quality Test Case Generation for Competitive Programming Paper • 2506.05817 • Published 23 days ago • 8
FusionAudio-1.2M: Towards Fine-grained Audio Captioning with Multimodal Contextual Fusion Paper • 2506.01111 • Published 28 days ago • 29
GUI-Reflection: Empowering Multimodal GUI Models with Self-Reflection Behavior Paper • 2506.08012 • Published 20 days ago • 7
Dreamland: Controllable World Creation with Simulator and Generative Models Paper • 2506.08006 • Published 20 days ago • 7
Saffron-1: Towards an Inference Scaling Paradigm for LLM Safety Assurance Paper • 2506.06444 • Published 23 days ago • 72
BitVLA: 1-bit Vision-Language-Action Models for Robotics Manipulation Paper • 2506.07530 • Published 20 days ago • 18
Autoregressive Semantic Visual Reconstruction Helps VLMs Understand Better Paper • 2506.09040 • Published 19 days ago • 34
Through the Valley: Path to Effective Long CoT Training for Small Language Models Paper • 2506.07712 • Published 20 days ago • 18
Multimodal DeepResearcher: Generating Text-Chart Interleaved Reports From Scratch with Agentic Framework Paper • 2506.02454 • Published 26 days ago • 5
Look Before You Leap: A GUI-Critic-R1 Model for Pre-Operative Error Diagnosis in GUI Automation Paper • 2506.04614 • Published 24 days ago • 16
Astra: Toward General-Purpose Mobile Robots via Hierarchical Multimodal Learning Paper • 2506.06205 • Published 23 days ago • 28
Xolver: Multi-Agent Reasoning with Holistic Experience Learning Just Like an Olympiad Team Paper • 2506.14234 • Published 12 days ago • 38
Treasure Hunt: Real-time Targeting of the Long Tail using Training-Time Markers Paper • 2506.14702 • Published 12 days ago • 3
AR-RAG: Autoregressive Retrieval Augmentation for Image Generation Paper • 2506.06962 • Published 21 days ago • 28
LoRA-Edit: Controllable First-Frame-Guided Video Editing via Mask-Aware LoRA Fine-Tuning Paper • 2506.10082 • Published 18 days ago • 8
General-Reasoner: Advancing LLM Reasoning Across All Domains Paper • 2505.14652 • Published May 20 • 22
Optimizing Length Compression in Large Reasoning Models Paper • 2506.14755 • Published 12 days ago • 10
UniFork: Exploring Modality Alignment for Unified Multimodal Understanding and Generation Paper • 2506.17202 • Published 9 days ago • 9
ReasonFlux-PRM: Trajectory-Aware PRMs for Long Chain-of-Thought Reasoning in LLMs Paper • 2506.18896 • Published 6 days ago • 26
Vision-Guided Chunking Is All You Need: Enhancing RAG with Multimodal Document Understanding Paper • 2506.16035 • Published 10 days ago • 79