This is the official model released for paper PAPO: Perception-Aware Policy Optimization for Multimodal Reasoning (arxiv.org/abs/2507.06448)
PAPO-H (γ=0.02)
-