Multimodal Models
Collection
Multimodal models with leading performance.
•
14 items
•
Updated
•
15
RLAIF-V-7B is trained based on LLaVA 1.5 7B with the novel RLAIF-V framework. By aligning with human preference via large scale AI feedback, the model achieves super GPT-4V trustworthiness. RLAIF-V maximally exploits the open-source feedback from two key perspectives, including high-quality feedback data and an online feedback learning algorithm.
Please look at GitHub for more details about usage.
If you find our model/code/paper helpful, please consider cite our papers 📝:
@article{yu2023rlhf,
title={Rlhf-v: Towards trustworthy mllms via behavior alignment from fine-grained correctional human feedback},
author={Yu, Tianyu and Yao, Yuan and Zhang, Haoye and He, Taiwen and Han, Yifeng and Cui, Ganqu and Hu, Jinyi and Liu, Zhiyuan and Zheng, Hai-Tao and Sun, Maosong and others},
journal={arXiv preprint arXiv:2312.00849},
year={2023}
}
@article{yu2024rlaifv,
title={RLAIF-V: Aligning MLLMs through Open-Source AI Feedback for Super GPT-4V Trustworthiness},
author={Yu, Tianyu and Zhang, Haoye and Yao, Yuan and Dang, Yunkai and Chen, Da and Lu, Xiaoman and Cui, Ganqu and He, Taiwen and Liu, Zhiyuan and Chua, Tat-Seng and Sun, Maosong},
journal={arXiv preprint arXiv:2405.17220},
year={2024},
}