R1-Reward: Training Multimodal Reward Model Through Stable Reinforcement Learning Paper • 2505.02835 • Published May 5 • 25