yifanzhang114
/

MM-RLHF-Reward-7B-llava-ov-qwen

Image-Text-to-Text

text-generation

Inference Endpoints

Model card Files Files and versions Community

yifanzhang114 commited on Feb 6

Commit

e4f86d5

·

verified ·

1 Parent(s): 51e44dd

Update README.md

Files changed (1) hide show

README.md +39 -3

README.md CHANGED Viewed

@@ -1,3 +1,39 @@
----
-license: mit
----

+---
+license: mit
+---
+<p align="center">
+  <img src="https://cdn-uploads.huggingface.co/production/uploads/623d8ca4c29adf5ef6175615/bp_DZR79-mTj8Z6GJe9B0.png" width="80%" />
+</p>
+<font size=3><div align='center' >
+[[📖 arXiv Paper](https://arxiv.org/abs/2406.08487)]
+[[📊 MM-RLHF Data](https://huggingface.co/datasets/yifanzhang114/MM-RLHF)]
+[[📝 Homepage](https://mm-rlhf.github.io/)]
+[[🏆 Reward Model](https://huggingface.co/yifanzhang114/MM-RLHF-Reward-7B-llava-ov-qwen)]
+[[🔮 MM-RewardBench](https://huggingface.co/datasets/yifanzhang114/MM-RLHF-RewardBench)]
+[[🔮 MM-SafetyBench](https://github.com/yfzhang114/mmrlhf-eval)]
+[[📈 Evaluation Suite](https://github.com/yfzhang114/mmrlhf-eval)]
+</div></font>
+# The Next Step Forward in Multimodal LLM Alignment
+**[2025/02/10]** 🔥 We are proud to open-source **MM-RLHF**, a comprehensive project for aligning Multimodal Large Language Models (MLLMs) with human preferences. This release includes:
+- A **high-quality MLLM alignment dataset**.
+- A **strong Critique-Based MLLM reward model** and its training algorithm.
+- A **novel alignment algorithm MM-DPO**.
+- **Two new benchmarks**.
+Our dataset and algorithms enable consistent performance improvements across **10 dimensions** and **27 benchmarks** for open-source MLLMs.
+<p align="center">
+  <img src="https://cdn-uploads.huggingface.co/production/uploads/623d8ca4c29adf5ef6175615/8nVZQd8bfB6NJIixCv6_X.png" width="80%" />
+</p>
+## Citation
+If you find it useful for your research and applications, please cite related papers/blogs using this BibTeX:
+```bibtex
+```