yifanzhang114
/

MM-RLHF-Reward-7B-llava-ov-qwen

Image-Text-to-Text

text-generation

Inference Endpoints

Model card Files Files and versions Community

MM-RLHF-Reward-7B-llava-ov-qwen / README.md

yifanzhang114's picture

Update README.md

58866a2 verified about 2 months ago

|

1.61 kB

	---
	license: mit
	---

	<p align="center">
	<img src="https://cdn-uploads.huggingface.co/production/uploads/623d8ca4c29adf5ef6175615/bp_DZR79-mTj8Z6GJe9B0.png" width="80%" />
	</p>

	<font size=3><div align='center' >
	[[📖 arXiv Paper](https://arxiv.org/abs/2406.08487)]
	[[📊 MM-RLHF Data](https://huggingface.co/datasets/yifanzhang114/MM-RLHF)]
	[[📝 Homepage](https://mm-rlhf.github.io/)]
	[[🏆 Reward Model](https://huggingface.co/yifanzhang114/MM-RLHF-Reward-7B-llava-ov-qwen)]

	[[🔮 MM-RewardBench](https://huggingface.co/datasets/yifanzhang114/MM-RLHF-RewardBench)]
	[[🔮 MM-SafetyBench](https://github.com/yfzhang114/mmrlhf-eval)]
	[[📈 Evaluation Suite](https://github.com/yfzhang114/mmrlhf-eval)]
	</div></font>


	# The Next Step Forward in Multimodal LLM Alignment

	[2025/02/10] 🔥 We are proud to open-source MM-RLHF, a comprehensive project for aligning Multimodal Large Language Models (MLLMs) with human preferences. This release includes:

	- A high-quality MLLM alignment dataset.
	- A strong Critique-Based MLLM reward model and its training algorithm.
	- A novel alignment algorithm MM-DPO.
	- Two new benchmarks.

	Our dataset and algorithms enable consistent performance improvements across 10 dimensions and 27 benchmarks for open-source MLLMs.
	<p align="center">
	<img src="https://cdn-uploads.huggingface.co/production/uploads/623d8ca4c29adf5ef6175615/8nVZQd8bfB6NJIixCv6_X.png" width="80%" />
	</p>

	## Citation

	If you find it useful for your research and applications, please cite related papers/blogs using this BibTeX:
	```bibtex

	```