File size: 2,545 Bytes
e4f86d5 ba2aed7 e4f86d5 72b84a5 e4f86d5 831e8f1 e4f86d5 58866a2 e4f86d5 65499e9 e4f86d5 ba2aed7 e4f86d5 5071653 e41e1f2 5071653 a2c647a e4f86d5 9a6e650 ba2aed7 |
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 |
---
license: mit
library_name: transformers
pipeline_tag: image-text-to-text
---

<font size=3><div align='center' >
[[๐ arXiv Paper](https://arxiv.org/abs/2502.10391)]
[[๐ MM-RLHF Data](https://huggingface.co/datasets/yifanzhang114/MM-RLHF)]
[[๐ Homepage](https://mm-rlhf.github.io/)]
[[๐ Reward Model](https://huggingface.co/yifanzhang114/MM-RLHF-Reward-7B-llava-ov-qwen)]
[[๐ฎ MM-RewardBench](https://huggingface.co/datasets/yifanzhang114/MM-RLHF-RewardBench)]
[[๐ฎ MM-SafetyBench](https://github.com/yfzhang114/mmrlhf-eval)]
[[๐ Evaluation Suite](https://github.com/yfzhang114/mmrlhf-eval)]
[[๐ Training Code](https://github.com/yfzhang114/MM-RLHF)]
</div></font>
# The Next Step Forward in Multimodal LLM Alignment
**[2025/02/10]** ๐ฅ We are proud to open-source **MM-RLHF**, a comprehensive project for aligning Multimodal Large Language Models (MLLMs) with human preferences. This release includes:
- A **high-quality MLLM alignment dataset**.
- A **strong Critique-Based MLLM reward model** and its training algorithm.
- A **novel alignment algorithm MM-DPO**.
- **Two new benchmarks**.
Our dataset and algorithms enable consistent performance improvements across **10 dimensions** and **27 benchmarks**.\n<p align="center">
<img src="https://cdn-uploads.huggingface.co/production/uploads/623d8ca4c29adf5ef6175615/8nVZQd8bfB6NJIixCv6_X.png" width="80%" />
</p>
## Use
### Intended use
The model was trained on [MM-RLHF data](https://huggingface.co/datasets/yifanzhang114/MM-RLHF) and have the ability to interact with images, multi-image and videos.

**Feel free to share your generations in the Community tab!**
### Generation
We provide the simple generation process for using our model. For more details, you could refer to [Github](https://github.com/yfzhang114/MM-RLHF).
## Citation
If you find it useful for your research and applications, please cite related papers/blogs using this BibTeX:
```bibtex
@article{zhang2025mm,
title={MM-RLHF: The Next Step Forward in Multimodal LLM Alignment},
author={Zhang, Yi-Fan and Yu, Tao and Tian, Haochen and Fu, Chaoyou and Li, Peiyan and Zeng, Jianshu and Xie, Wulin and Shi, Yang and Zhang, Huanyu and Wu, Junkang and others},
journal={arXiv preprint arXiv:2502.10391},
year={2025}
}
``` |