yifanzhang114's picture
Update README.md
e4f86d5 verified
|
raw
history blame
1.61 kB
---
license: mit
---
<p align="center">
<img src="https://cdn-uploads.huggingface.co/production/uploads/623d8ca4c29adf5ef6175615/bp_DZR79-mTj8Z6GJe9B0.png" width="80%" />
</p>
<font size=3><div align='center' >
[[๐Ÿ“– arXiv Paper](https://arxiv.org/abs/2406.08487)]
[[๐Ÿ“Š MM-RLHF Data](https://huggingface.co/datasets/yifanzhang114/MM-RLHF)]
[[๐Ÿ“ Homepage](https://mm-rlhf.github.io/)]
[[๐Ÿ† Reward Model](https://huggingface.co/yifanzhang114/MM-RLHF-Reward-7B-llava-ov-qwen)]
[[๐Ÿ”ฎ MM-RewardBench](https://huggingface.co/datasets/yifanzhang114/MM-RLHF-RewardBench)]
[[๐Ÿ”ฎ MM-SafetyBench](https://github.com/yfzhang114/mmrlhf-eval)]
[[๐Ÿ“ˆ Evaluation Suite](https://github.com/yfzhang114/mmrlhf-eval)]
</div></font>
# The Next Step Forward in Multimodal LLM Alignment
**[2025/02/10]** ๐Ÿ”ฅ We are proud to open-source **MM-RLHF**, a comprehensive project for aligning Multimodal Large Language Models (MLLMs) with human preferences. This release includes:
- A **high-quality MLLM alignment dataset**.
- A **strong Critique-Based MLLM reward model** and its training algorithm.
- A **novel alignment algorithm MM-DPO**.
- **Two new benchmarks**.
Our dataset and algorithms enable consistent performance improvements across **10 dimensions** and **27 benchmarks** for open-source MLLMs.
<p align="center">
<img src="https://cdn-uploads.huggingface.co/production/uploads/623d8ca4c29adf5ef6175615/8nVZQd8bfB6NJIixCv6_X.png" width="80%" />
</p>
## Citation
If you find it useful for your research and applications, please cite related papers/blogs using this BibTeX:
```bibtex
```