zzfive
's Collections
RL+reason model
updated
RL + Transformer = A General-Purpose Problem Solver
Paper
•
2501.14176
•
Published
•
28
Towards General-Purpose Model-Free Reinforcement Learning
Paper
•
2501.16142
•
Published
•
30
SFT Memorizes, RL Generalizes: A Comparative Study of Foundation Model
Post-training
Paper
•
2501.17161
•
Published
•
120
MaxInfoRL: Boosting exploration in reinforcement learning through
information gain maximization
Paper
•
2412.12098
•
Published
•
5
RLDG: Robotic Generalist Policy Distillation via Reinforcement Learning
Paper
•
2412.09858
•
Published
•
1
Thoughts Are All Over the Place: On the Underthinking of o1-Like LLMs
Paper
•
2501.18585
•
Published
•
61
o3-mini vs DeepSeek-R1: Which One is Safer?
Paper
•
2501.18438
•
Published
•
24
s1: Simple test-time scaling
Paper
•
2501.19393
•
Published
•
118
Process Reinforcement through Implicit Rewards
Paper
•
2502.01456
•
Published
•
60
The Jumping Reasoning Curve? Tracking the Evolution of Reasoning
Performance in GPT-[n] and o-[n] Models on Multimodal Puzzles
Paper
•
2502.01081
•
Published
•
14
Improving Transformer World Models for Data-Efficient RL
Paper
•
2502.01591
•
Published
•
9
Satori: Reinforcement Learning with Chain-of-Action-Thought Enhances LLM
Reasoning via Autoregressive Search
Paper
•
2502.02508
•
Published
•
23
Demystifying Long Chain-of-Thought Reasoning in LLMs
Paper
•
2502.03373
•
Published
•
59
Boosting Multimodal Reasoning with MCTS-Automated Structured Thinking
Paper
•
2502.02339
•
Published
•
22
A Probabilistic Inference Approach to Inference-Time Scaling of LLMs
using Particle-Based Monte Carlo Methods
Paper
•
2502.01618
•
Published
•
10
BOLT: Bootstrap Long Chain-of-Thought in Language Models without
Distillation
Paper
•
2502.03860
•
Published
•
24
Step Back to Leap Forward: Self-Backtracking for Boosting Reasoning of
Language Models
Paper
•
2502.04404
•
Published
•
24
Can 1B LLM Surpass 405B LLM? Rethinking Compute-Optimal Test-Time
Scaling
Paper
•
2502.06703
•
Published
•
150
ReasonFlux: Hierarchical LLM Reasoning via Scaling Thought Templates
Paper
•
2502.06772
•
Published
•
21
LLMs Can Easily Learn to Reason from Demonstrations Structure, not
content, is what matters!
Paper
•
2502.07374
•
Published
•
39
Teaching Language Models to Critique via Reinforcement Learning
Paper
•
2502.03492
•
Published
•
24
An Open Recipe: Adapting Language-Specific LLMs to a Reasoning Model in
One Day via Model Merging
Paper
•
2502.09056
•
Published
•
31
Logical Reasoning in Large Language Models: A Survey
Paper
•
2502.09100
•
Published
•
22
The Danger of Overthinking: Examining the Reasoning-Action Dilemma in
Agentic Tasks
Paper
•
2502.08235
•
Published
•
57
video-SALMONN-o1: Reasoning-enhanced Audio-visual Large Language Model
Paper
•
2502.11775
•
Published
•
8
Soundwave: Less is More for Speech-Text Alignment in LLMs
Paper
•
2502.12900
•
Published
•
84
Revisiting the Test-Time Scaling of o1-like Models: Do they Truly
Possess Test-Time Scaling Capabilities?
Paper
•
2502.12215
•
Published
•
16
Small Models Struggle to Learn from Strong Reasoners
Paper
•
2502.12143
•
Published
•
34
Thinking Preference Optimization
Paper
•
2502.13173
•
Published
•
17
Logic-RL: Unleashing LLM Reasoning with Rule-Based Reinforcement
Learning
Paper
•
2502.14768
•
Published
•
48
LightThinker: Thinking Step-by-Step Compression
Paper
•
2502.15589
•
Published
•
28
The Relationship Between Reasoning and Performance in Large Language
Models -- o3 (mini) Thinks Harder, Not Longer
Paper
•
2502.15631
•
Published
•
9
Multimodal Inconsistency Reasoning (MMIR): A New Benchmark for
Multimodal Reasoning Models
Paper
•
2502.16033
•
Published
•
17
SWE-RL: Advancing LLM Reasoning via Reinforcement Learning on Open
Software Evolution
Paper
•
2502.18449
•
Published
•
73
Self-rewarding correction for mathematical reasoning
Paper
•
2502.19613
•
Published
•
84
MedVLM-R1: Incentivizing Medical Reasoning Capability of Vision-Language
Models (VLMs) via Reinforcement Learning
Paper
•
2502.19634
•
Published
•
63
FINEREASON: Evaluating and Improving LLMs' Deliberate Reasoning through
Reflective Puzzle Solving
Paper
•
2502.20238
•
Published
•
24
Visual-RFT: Visual Reinforcement Fine-Tuning
Paper
•
2503.01785
•
Published
•
76
SoS1: O1 and R1-Like Reasoning LLMs are Sum-of-Square Solvers
Paper
•
2502.20545
•
Published
•
20
Cognitive Behaviors that Enable Self-Improving Reasoners, or, Four
Habits of Highly Effective STaRs
Paper
•
2503.01307
•
Published
•
37
Efficient Test-Time Scaling via Self-Calibration
Paper
•
2503.00031
•
Published
•
14
LADDER: Self-Improving LLMs Through Recursive Problem Decomposition
Paper
•
2503.00735
•
Published
•
20
START: Self-taught Reasoner with Tools
Paper
•
2503.04625
•
Published
•
108
Sketch-of-Thought: Efficient LLM Reasoning with Adaptive
Cognitive-Inspired Sketching
Paper
•
2503.05179
•
Published
•
44
R1-Zero's "Aha Moment" in Visual Reasoning on a 2B Non-SFT Model
Paper
•
2503.05132
•
Published
•
55
R1-Searcher: Incentivizing the Search Capability in LLMs via
Reinforcement Learning
Paper
•
2503.05592
•
Published
•
25
Learning from Failures in Multi-Attempt Reinforcement Learning
Paper
•
2503.04808
•
Published
•
17
An Empirical Study on Eliciting and Improving R1-like Reasoning Models
Paper
•
2503.04548
•
Published
•
8
MM-Eureka: Exploring Visual Aha Moment with Rule-based Large-scale
Reinforcement Learning
Paper
•
2503.07365
•
Published
•
57
Vision-R1: Incentivizing Reasoning Capability in Multimodal Large
Language Models
Paper
•
2503.06749
•
Published
•
27
LMM-R1: Empowering 3B LMMs with Strong Reasoning Abilities Through
Two-Stage Rule-Based RL
Paper
•
2503.07536
•
Published
•
84
GTR: Guided Thought Reinforcement Prevents Thought Collapse in RL-based
VLM Agent Training
Paper
•
2503.08525
•
Published
•
15
Search-R1: Training LLMs to Reason and Leverage Search Engines with
Reinforcement Learning
Paper
•
2503.09516
•
Published
•
27
GoT: Unleashing Reasoning Capability of Multimodal Large Language Model
for Visual Generation and Editing
Paper
•
2503.10639
•
Published
•
48
VisualPRM: An Effective Process Reward Model for Multimodal Reasoning
Paper
•
2503.10291
•
Published
•
34
Light-R1: Curriculum SFT, DPO and RL for Long COT from Scratch and
Beyond
Paper
•
2503.10460
•
Published
•
27
R1-Onevision: Advancing Generalized Multimodal Reasoning through
Cross-Modal Formalization
Paper
•
2503.10615
•
Published
•
16
Multimodal Chain-of-Thought Reasoning: A Comprehensive Survey
Paper
•
2503.12605
•
Published
•
34
R1-VL: Learning to Reason with Multimodal Large Language Models via
Step-wise Group Relative Policy Optimization
Paper
•
2503.12937
•
Published
•
28
reWordBench: Benchmarking and Improving the Robustness of Reward Models
with Transformed Inputs
Paper
•
2503.11751
•
Published
•
16
DAPO: An Open-Source LLM Reinforcement Learning System at Scale
Paper
•
2503.14476
•
Published
•
119
Temporal Consistency for LLM Reasoning Process Error Identification
Paper
•
2503.14495
•
Published
•
9
Towards Self-Improving Systematic Cognition for Next-Generation
Foundation MLLMs
Paper
•
2503.12303
•
Published
•
7
SWEET-RL: Training Multi-Turn LLM Agents on Collaborative Reasoning
Tasks
Paper
•
2503.15478
•
Published
•
10
Stop Overthinking: A Survey on Efficient Reasoning for Large Language
Models
Paper
•
2503.16419
•
Published
•
70
OpenVLThinker: An Early Exploration to Complex Vision-Language Reasoning
via Iterative Self-Improvement
Paper
•
2503.17352
•
Published
•
23
I Have Covered All the Bases Here: Interpreting Reasoning Features in
Large Language Models via Sparse Autoencoders
Paper
•
2503.18878
•
Published
•
117
SimpleRL-Zoo: Investigating and Taming Zero Reinforcement Learning for
Open Base Models in the Wild
Paper
•
2503.18892
•
Published
•
30
Vision-R1: Evolving Human-Free Alignment in Large Vision-Language Models
via Vision-Guided Reinforcement Learning
Paper
•
2503.18013
•
Published
•
19
Mind with Eyes: from Language Reasoning to Multimodal Reasoning
Paper
•
2503.18071
•
Published
•
3
Inference-Time Scaling for Flow Models via Stochastic Generation and
Rollover Budget Forcing
Paper
•
2503.19385
•
Published
•
33
Think Twice: Enhancing LLM Reasoning by Scaling Multi-round Test-time
Thinking
Paper
•
2503.19855
•
Published
•
26
ReSearch: Learning to Reason with Search for LLMs via Reinforcement
Learning
Paper
•
2503.19470
•
Published
•
17
ViLBench: A Suite for Vision-Language Process Reward Modeling
Paper
•
2503.20271
•
Published
•
7
Video-R1: Reinforcing Video Reasoning in MLLMs
Paper
•
2503.21776
•
Published
•
78
OThink-MR1: Stimulating multimodal generalized reasoning capabilities
via dynamic reinforcement learning
Paper
•
2503.16081
•
Published
•
26
A Survey of Efficient Reasoning for Large Reasoning Models: Language,
Multimodality, and Beyond
Paper
•
2503.21614
•
Published
•
39
Exploring Data Scaling Trends and Effects in Reinforcement Learning from
Human Feedback
Paper
•
2503.22230
•
Published
•
43
Open-Reasoner-Zero: An Open Source Approach to Scaling Up Reinforcement
Learning on the Base Model
Paper
•
2503.24290
•
Published
•
61
What, How, Where, and How Well? A Survey on Test-Time Scaling in Large
Language Models
Paper
•
2503.24235
•
Published
•
52
Efficient Inference for Large Reasoning Models: A Survey
Paper
•
2503.23077
•
Published
•
45
Exploring the Effect of Reinforcement Learning on Video Understanding:
Insights from SEED-Bench-R1
Paper
•
2503.24376
•
Published
•
37
Z1: Efficient Test-time Scaling with Code
Paper
•
2504.00810
•
Published
•
25
Improved Visual-Spatial Reasoning via R1-Zero-Like Training
Paper
•
2504.00883
•
Published
•
60
Understanding R1-Zero-Like Training: A Critical Perspective
Paper
•
2503.20783
•
Published
•
43
Inference-Time Scaling for Generalist Reward Modeling
Paper
•
2504.02495
•
Published
•
52
Rethinking RL Scaling for Vision Language Models: A Transparent,
From-Scratch Framework and Comprehensive Evaluation Scheme
Paper
•
2504.02587
•
Published
•
30
Rethinking Reflection in Pre-Training
Paper
•
2504.04022
•
Published
•
75
Quantization Hurts Reasoning? An Empirical Study on Quantized Reasoning
Models
Paper
•
2504.04823
•
Published
•
29
VAPO: Efficient and Reliable Reinforcement Learning for Advanced
Reasoning Tasks
Paper
•
2504.05118
•
Published
•
24
Why Reasoning Matters? A Survey of Advancements in Multimodal Reasoning
(v1)
Paper
•
2504.03151
•
Published
•
12
VideoChat-R1: Enhancing Spatio-Temporal Perception via Reinforcement
Fine-Tuning
Paper
•
2504.06958
•
Published
•
10
DeepSeek-R1 Thoughtology: Let's <think> about LLM Reasoning
Paper
•
2504.07128
•
Published
•
80
VLM-R1: A Stable and Generalizable R1-style Large Vision-Language Model
Paper
•
2504.07615
•
Published
•
29
VL-Rethinker: Incentivizing Self-Reflection of Vision-Language Models
with Reinforcement Learning
Paper
•
2504.08837
•
Published
•
40
DUMP: Automated Distribution-Level Curriculum Learning for RL-based LLM
Post-training
Paper
•
2504.09710
•
Published
•
18
TinyLLaVA-Video-R1: Towards Smaller LMMs for Video Reasoning
Paper
•
2504.09641
•
Published
•
15
Reasoning Models Can Be Effective Without Thinking
Paper
•
2504.09858
•
Published
•
10
xVerify: Efficient Answer Verifier for Reasoning Model Evaluations
Paper
•
2504.10481
•
Published
•
79
Efficient Reasoning Models: A Survey
Paper
•
2504.10903
•
Published
•
17
Efficient Process Reward Model Training via Active Learning
Paper
•
2504.10559
•
Published
•
12
A Minimalist Approach to LLM Reasoning: from Rejection Sampling to
Reinforce
Paper
•
2504.11343
•
Published
•
12
ReTool: Reinforcement Learning for Strategic Tool Use in LLMs
Paper
•
2504.11536
•
Published
•
50
SFT or RL? An Early Investigation into Training R1-Like Reasoning Large
Vision-Language Models
Paper
•
2504.11468
•
Published
•
23