VARD: Efficient and Dense Fine-Tuning for Diffusion Models with Value-based RL
Abstract
VARD introduces a value function based reinforcement learning approach to enhance diffusion models with dense and differentiable supervision, improving training efficiency and handling non-differentiable rewards.
Diffusion models have emerged as powerful generative tools across various domains, yet tailoring pre-trained models to exhibit specific desirable properties remains challenging. While reinforcement learning (RL) offers a promising solution,current methods struggle to simultaneously achieve stable, efficient fine-tuning and support non-differentiable rewards. Furthermore, their reliance on sparse rewards provides inadequate supervision during intermediate steps, often resulting in suboptimal generation quality. To address these limitations, dense and differentiable signals are required throughout the diffusion process. Hence, we propose VAlue-based Reinforced Diffusion (VARD): a novel approach that first learns a value function predicting expection of rewards from intermediate states, and subsequently uses this value function with KL regularization to provide dense supervision throughout the generation process. Our method maintains proximity to the pretrained model while enabling effective and stable training via backpropagation. Experimental results demonstrate that our approach facilitates better trajectory guidance, improves training efficiency and extends the applicability of RL to diffusion models optimized for complex, non-differentiable reward functions.
Community
VAlue-based Reinforced Diffusion (VARD): a novel approach that first learns a value function predicting expection of rewards from intermediate states, and subsequently uses this value function with KL regularization to provide dense supervision throughout the generation process.
This is an automated message from the Librarian Bot. I found the following papers similar to this paper.
The following papers were recommended by the Semantic Scholar API
- IN-RIL: Interleaved Reinforcement and Imitation Learning for Policy Fine-Tuning (2025)
- Prior-Guided Diffusion Planning for Offline Reinforcement Learning (2025)
- Fine-tuning Diffusion Policies with Backpropagation Through Diffusion Timesteps (2025)
- DRAGON: Distributional Rewards Optimize Diffusion Generative Models (2025)
- InPO: Inversion Preference Optimization with Reparametrized DDIM for Efficient Diffusion Model Alignment (2025)
- Domain Guidance: A Simple Transfer Approach for a Pre-trained Diffusion Model (2025)
- Offline Reinforcement Learning with Discrete Diffusion Skills (2025)
Please give a thumbs up to this comment if you found it helpful!
If you want recommendations for any Paper on Hugging Face checkout this Space
You can directly ask Librarian Bot for paper recommendations by tagging it in a comment:
@librarian-bot
recommend
Models citing this paper 0
No model linking this paper
Datasets citing this paper 0
No dataset linking this paper
Spaces citing this paper 0
No Space linking this paper
Collections including this paper 0
No Collection including this paper