You are viewing v0.12.0 version.
A newer version
v0.15.1 is available.
data:image/s3,"s3://crabby-images/a830c/a830cd623baa7cfbea34821976b50cd0c98eebaa" alt=""
TRL - Transformer Reinforcement Learning
TRL is a full stack library where we provide a set of tools to train transformer language models with Reinforcement Learning, from the Supervised Fine-tuning step (SFT), Reward Modeling step (RM) to the Proximal Policy Optimization (PPO) step. The library is integrated with 🤗 transformers.
data:image/s3,"s3://crabby-images/16945/1694526c1b5a8f789ddb8427a06ce9555fce7a2b" alt=""
Check the appropriate sections of the documentation depending on your needs:
API documentation
- Model Classes: A brief overview of what each public model class does.
SFTTrainer
: Supervise Fine-tune your model easily withSFTTrainer
RewardTrainer
: Train easily your reward model usingRewardTrainer
.PPOTrainer
: Further fine-tune the supervised fine-tuned model using PPO algorithm- Best-of-N Sampling: Use best of n sampling as an alternative way to sample predictions from your active model
DPOTrainer
: Direct Preference Optimization training usingDPOTrainer
.TextEnvironment
: Text environment to train your model using tools with RL.
Examples
- Sentiment Tuning: Fine tune your model to generate positive movie contents
- Training with PEFT: Memory efficient RLHF training using adapters with PEFT
- Detoxifying LLMs: Detoxify your language model through RLHF
- StackLlama: End-to-end RLHF training of a Llama model on Stack exchange dataset
- Learning with Tools: Walkthrough of using
TextEnvironments
- Multi-Adapter Training: Use a single base model and multiple adapters for memory efficient end-to-end training
Blog posts
data:image/s3,"s3://crabby-images/de224/de224df8759479b3090a89de5754d8c5ed8a1730" alt="thumbnail"
Preference Optimization for Vision Language Models with TRL
data:image/s3,"s3://crabby-images/cd040/cd040ffe9ac868b8c314e60b8cf49f6844b0ad83" alt="thumbnail"
Illustrating Reinforcement Learning from Human Feedback
data:image/s3,"s3://crabby-images/a6d6f/a6d6f90a7ca9da3a0374d113f8997a04add0be8d" alt="thumbnail"
Fine-tuning 20B LLMs with RLHF on a 24GB consumer GPU
data:image/s3,"s3://crabby-images/4d58f/4d58fb54ec609019e364427e7615587e8718dcf6" alt="thumbnail"
StackLLaMA: A hands-on guide to train LLaMA with RLHF
data:image/s3,"s3://crabby-images/cc5a3/cc5a3702cde4fe0124877756f214d2e1183ea4c6" alt="thumbnail"
Fine-tune Llama 2 with DPO
data:image/s3,"s3://crabby-images/3d486/3d4863e04d08fc8ce9e2f9aa386c31953782a05d" alt="thumbnail"
Finetune Stable Diffusion Models with DDPO via TRL