CodeGoat24 commited on
Commit
373d46a
Β·
verified Β·
1 Parent(s): d668d82

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +4 -4
README.md CHANGED
@@ -19,7 +19,7 @@ base_model:
19
  `Unified-Reward-Think-7b` is the first unified multimodal CoT reward model, capable of multi-dimensional, step-by-step long-chain reasoning for both visual understanding and generation reward tasks.
20
 
21
  For further details, please refer to the following resources:
22
- <!-- - πŸ“° Paper: https://arxiv.org/pdf/2503.05236 -->
23
  - πŸͺ Project Page: https://codegoat24.github.io/UnifiedReward/think
24
  - πŸ€— Model Collections: https://huggingface.co/collections/CodeGoat24/unifiedreward-models-67c3008148c3a380d15ac63a
25
  - πŸ€— Dataset Collections: https://huggingface.co/collections/CodeGoat24/unifiedreward-training-data-67c300d4fd5eff00fa7f1ede
@@ -112,10 +112,10 @@ print(text_outputs[0])
112
  ## Citation
113
 
114
  ```
115
- @article{UnifiedReward,
116
  title={Unified Multimodal Chain-of-Thought Reward Model through Reinforcement Fine-Tuning.},
117
- author={Wang, Yibin and Li, Zhimin and Zang, Yuhang and Wang, Chunyu and Lu, Qinglin and Jin, Cheng and Wang, Jiaqi},
118
- journal={arXiv preprint arXiv:},
119
  year={2025}
120
  }
121
  ```
 
19
  `Unified-Reward-Think-7b` is the first unified multimodal CoT reward model, capable of multi-dimensional, step-by-step long-chain reasoning for both visual understanding and generation reward tasks.
20
 
21
  For further details, please refer to the following resources:
22
+ - πŸ“° Paper: https://arxiv.org/pdf/2505.03318
23
  - πŸͺ Project Page: https://codegoat24.github.io/UnifiedReward/think
24
  - πŸ€— Model Collections: https://huggingface.co/collections/CodeGoat24/unifiedreward-models-67c3008148c3a380d15ac63a
25
  - πŸ€— Dataset Collections: https://huggingface.co/collections/CodeGoat24/unifiedreward-training-data-67c300d4fd5eff00fa7f1ede
 
112
  ## Citation
113
 
114
  ```
115
+ @article{UnifiedReward-Think,
116
  title={Unified Multimodal Chain-of-Thought Reward Model through Reinforcement Fine-Tuning.},
117
+ author={Wang, Yibin and Li, Zhimin and Zang, Yuhang and Wang, Chunyu and Lu, Qinglin, and Jin, Cheng and Wang, Jiaqi},
118
+ journal={arXiv preprint arXiv:2505.03318},
119
  year={2025}
120
  }
121
  ```