[๐ arXiv Paper] [๐ MM-RLHF Data] [๐ Homepage] [๐ Reward Model]
[๐ฎ MM-RewardBench] [๐ฎ MM-SafetyBench] [๐ Evaluation Suite]
The Next Step Forward in Multimodal LLM Alignment
[2025/02/10] ๐ฅ We are proud to open-source MM-RLHF, a comprehensive project for aligning Multimodal Large Language Models (MLLMs) with human preferences. This release includes:
- A high-quality MLLM alignment dataset.
- A strong Critique-Based MLLM reward model and its training algorithm.
- A novel alignment algorithm MM-DPO.
- Two new benchmarks.
Our dataset and algorithms enable consistent performance improvements across 10 dimensions and 27 benchmarks.
Use
Intended use
The model was trained on MM-RLHF data and have the ability to interact with images, multi-image and videos.
Feel free to share your generations in the Community tab!
Generation
We provide the simple generation process for using our model. For more details, you could refer to Github.
Citation
If you find it useful for your research and applications, please cite related papers/blogs using this BibTeX:
- Downloads last month
- 49
Inference Providers
NEW
This model is not currently available via any of the supported third-party Inference Providers, and
HF Inference API was unable to determine this model's library.