Post
1322
This summer TRL leveled up for multimodal alignment π
β New VLM alignment methods (MPO, GRPO, GSPO)
β Extended RLOO & Online DPO for VLMs
β Native SFT support
β Ready-to-use training scripts
π https://huggingface.co/blog/trl-vlm-alignment
β New VLM alignment methods (MPO, GRPO, GSPO)
β Extended RLOO & Online DPO for VLMs
β Native SFT support
β Ready-to-use training scripts
π https://huggingface.co/blog/trl-vlm-alignment