CodeGoat24
/

llava-onevision-qwen2-7b-ov-unifiedreward-dpo

Model card Files Files and versions Community

CodeGoat24 commited on Mar 10

Commit

ad279bc

·

verified ·

1 Parent(s): d552a79

Update README.md

Files changed (1) hide show

README.md +7 -2

README.md CHANGED Viewed

@@ -11,7 +11,7 @@ base_model:
 This model is trained on LLaVA-OneVision based on DPO preference data constructed by our [UnifiedReward-7B](https://huggingface.co/CodeGoat24/UnifiedReward-7b) for enhanced image understanding ability.
 For further details, please refer to the following resources:
-- 📰 Paper:
 - 🪐 Project Page: https://codegoat24.github.io/UnifiedReward/
 - 🤗 Model Collections: https://huggingface.co/collections/CodeGoat24/unifiedreward-models-67c3008148c3a380d15ac63a
 - 🤗 Dataset Collections: https://huggingface.co/collections/CodeGoat24/unifiedreward-training-data-67c300d4fd5eff00fa7f1ede
@@ -76,5 +76,10 @@ print(text_outputs)
 ## Citation
 ```
 ```

 This model is trained on LLaVA-OneVision based on DPO preference data constructed by our [UnifiedReward-7B](https://huggingface.co/CodeGoat24/UnifiedReward-7b) for enhanced image understanding ability.
 For further details, please refer to the following resources:
+- 📰 Paper: https://arxiv.org/pdf/2503.05236
 - 🪐 Project Page: https://codegoat24.github.io/UnifiedReward/
 - 🤗 Model Collections: https://huggingface.co/collections/CodeGoat24/unifiedreward-models-67c3008148c3a380d15ac63a
 - 🤗 Dataset Collections: https://huggingface.co/collections/CodeGoat24/unifiedreward-training-data-67c300d4fd5eff00fa7f1ede
 ## Citation
 ```
+@article{UnifiedReward,
+  title={Unified Reward Model for Multimodal Understanding and Generation.},
+  author={Wang, Yibin and Zang, Yuhang, and Li, Hao and Jin, Cheng and Wang Jiaqi},
+  journal={arXiv preprint arXiv:2503.05236},
+  year={2025}
+}
 ```