TRL documentation
Speeding Up Training
You are viewing v0.23.1 version.
			
				A newer version
					v0.24.0 is available.
Speeding Up Training
Section under construction. Feel free to contribute!
vLLM for fast generation in online methods
Online methods such as GRPO or Online DPO require the model to generate completions, which is often a slow process and can significantly impact training time. To speed up generation, you can use vLLM, a library that enables fast generation through, among other things, PagedAttention. TRL’s online trainers support vLLM, greatly improving training speed.
To use vLLM, first install it using:
pip install vllm
or
pip install "trl[vllm]"Online DPO 
GRPO 
RLOO 
Then, enable it by passing use_vllm=True in the training arguments.
from trl import OnlineDPOConfig
training_args = OnlineDPOConfig(..., use_vllm=True)