Command Line Interfaces (CLIs)

TRL provides a powerful command-line interface (CLI) to fine-tune large language models (LLMs) using methods like Supervised Fine-Tuning (SFT), Direct Preference Optimization (DPO), and more. The CLI abstracts away much of the boilerplate, letting you launch training jobs quickly and reproducibly.

Commands

Currently supported commands are:

Training Commands

trl dpo: fine-tune a LLM with DPO
trl grpo: fine-tune a LLM with GRPO
trl kto: fine-tune a LLM with KTO
trl reward: train a Reward Model
trl rloo: fine-tune a LLM with RLOO
trl sft: fine-tune a LLM with SFT

Other Commands

trl env: get the system information
trl vllm-serve: serve a model with vLLM

Fine-Tuning with the TRL CLI

Basic Usage

You can launch training directly from the CLI by specifying required arguments like the model and dataset:

SFT

DPO

Reward

Using Configuration Files

To keep your CLI commands clean and reproducible, you can define all training arguments in a YAML configuration file:

SFT

DPO

Reward

Scaling Up with Accelerate

TRL CLI natively supports 🤗 Accelerate, making it easy to scale training across multiple GPUs, machines, or use advanced setups like DeepSpeed — all from the same CLI.

You can pass any accelerate launch arguments directly to trl, such as --num_processes. For more information see Using accelerate launch.

SFT inline

SFT w/ config file

DPO inline

DPO w/ config file

Reward inline

Reward w/ config file

Using --accelerate_config for Accelerate Configuration

The --accelerate_config flag lets you easily configure distributed training with 🤗 Accelerate. This flag accepts either:

the name of a predefined config profile (built into TRL), or
a path to a custom Accelerate YAML config file.

Predefined Config Profiles

TRL provides several ready-to-use Accelerate configs to simplify common training setups:

Name	Description
`fsdp1`	Fully Sharded Data Parallel Stage 1
`fsdp2`	Fully Sharded Data Parallel Stage 2
`zero1`	DeepSpeed ZeRO Stage 1
`zero2`	DeepSpeed ZeRO Stage 2
`zero3`	DeepSpeed ZeRO Stage 3
`multi_gpu`	Multi-GPU training
`single_gpu`	Single-GPU training

To use one of these, just pass the name to --accelerate_config. TRL will automatically load the corresponding config file from trl/accelerate_config/.

Example Usage

SFT inline

SFT w/ config file

DPO inline

DPO w/ config file

Reward inline

Reward w/ config file

Using dataset mixtures

You can use dataset mixtures to combine multiple datasets into a single training dataset. This is useful for training on diverse data sources or when you want to mix different types of data.

SFT

DPO

Reward

To see all the available keywords for defining dataset mixtures, refer to the scripts.utils.DatasetConfig and DatasetMixtureConfig classes.

Getting the System Information

You can get the system information by running the following command:

trl env

This will print out the system information, including the GPU information, the CUDA version, the PyTorch version, the transformers version, the TRL version, and any optional dependencies that are installed.

Copy-paste the following information when reporting an issue:

- Platform: Linux-5.15.0-1048-aws-x86_64-with-glibc2.31
- Python version: 3.11.9
- PyTorch version: 2.4.1
- accelerator(s): NVIDIA H100 80GB HBM3
- Transformers version: 4.45.0.dev0
- Accelerate version: 0.34.2
- Accelerate config: 
  - compute_environment: LOCAL_MACHINE
  - distributed_type: DEEPSPEED
  - mixed_precision: no
  - use_cpu: False
  - debug: False
  - num_processes: 4
  - machine_rank: 0
  - num_machines: 1
  - rdzv_backend: static
  - same_network: True
  - main_training_function: main
  - enable_cpu_affinity: False
  - deepspeed_config: {'gradient_accumulation_steps': 4, 'offload_optimizer_device': 'none', 'offload_param_device': 'none', 'zero3_init_flag': False, 'zero_stage': 2}
  - downcast_bf16: no
  - tpu_use_cluster: False
  - tpu_use_sudo: False
  - tpu_env: []
- Datasets version: 3.0.0
- HF Hub version: 0.24.7
- TRL version: 0.12.0.dev0+acb4d70
- bitsandbytes version: 0.41.1
- DeepSpeed version: 0.15.1
- Diffusers version: 0.30.3
- Liger-Kernel version: 0.3.0
- LLM-Blender version: 0.0.2
- OpenAI version: 1.46.0
- PEFT version: 0.12.0
- vLLM version: not installed

This information is required when reporting an issue.

Update on GitHub