This is the official model released for paper PAPO: Perception-Aware Policy Optimization for Multimodal Reasoning (arxiv.org/abs/2507.06448)
PAPO (γ=0.01)
-