Diffusers documentation
Reinforcement learning training with DDPO
Get started
Tutorials
Load pipelines and adapters
Load pipelinesLoad community pipelines and componentsLoad schedulers and modelsModel files and layoutsPush files to the Hub
Adapters
Generative tasks
Inference techniques
OverviewCreate a serverDistributed inferenceScheduler featuresPipeline callbacksReproducible pipelinesControlling image qualityPrompt techniques
Advanced inference
Hybrid Inference
Specific pipeline examples
ConsisIDStable Diffusion XLSDXL TurboKandinskyOmniGenPAGLatent Consistency ModelShap-EDiffEditTrajectory Consistency Distillation-LoRAStable Video DiffusionMarigold Computer Vision
Training
Quantization Methods
Accelerate inference and reduce memory
Accelerate inferenceCachingReduce memory usageCompile and offloading quantized modelsPrunaxFormersToken mergingDeepCacheTGATExDiTParaAttention
Optimized model formats
Optimized hardware
Conceptual Guides
PhilosophyControlled generationHow to contribute?Diffusers' Ethical GuidelinesEvaluating Diffusion Models
Community Projects
API
Main Classes
Loaders
Models
Pipelines
Schedulers
Internal classes
You are viewing v0.34.0 version. A newer version v0.38.0 is available.
Reinforcement learning training with DDPO
You can fine-tune Stable Diffusion on a reward function via reinforcement learning with the 🤗 TRL library and 🤗 Diffusers. This is done with the Denoising Diffusion Policy Optimization (DDPO) algorithm introduced by Black et al. in Training Diffusion Models with Reinforcement Learning, which is implemented in 🤗 TRL with the DDPOTrainer.
For more information, check out the DDPOTrainer API reference and the Finetune Stable Diffusion Models with DDPO via TRL blog post.
< > Update on GitHub