URSA-8B-PS-GRPO

URSA-8B-PS-GRPO employs process-supervision grpo which proposed in our paper.

Installation

from huggingface_hub import snapshot_download

repo_id = "URSA-MATH/URSA-8B-PS-GRPO"
local_dir = YOUR_LOCAL_PATH  

snapshot_path = snapshot_download(
    repo_id=repo_id,
    local_dir=local_dir,
    revision="main", 
    cache_dir=None, 
)

Inference

We have adapted vLLM for URSA-8B. Please refer to the GitHub repository for quick inference implementation.

Besides, we have adapted evaluation on VLMEvalKit!

Citation

If you find our paper, model, or data helpful, please give this repo a star ๐ŸŒŸ and cite our article โœ๏ธ.

@article{luo2025ursa,
  title={URSA: Understanding and Verifying Chain-of-thought Reasoning in Multimodal Mathematics},
  author={Luo, Ruilin and Zheng, Zhuofan and Wang, Yifan and Yu, Yiyao and Ni, Xinzhe and Lin, Zicheng and Zeng, Jin and Yang, Yujiu},
  journal={arXiv preprint arXiv:2501.04686},
  year={2025}
}

Downloads last month
5
Safetensors
Model size
8.04B params
Tensor type
F32
ยท
Inference Providers NEW
This model isn't deployed by any Inference Provider. ๐Ÿ™‹ Ask for provider support

Dataset used to train URSA-MATH/URSA-8B-PS-GRPO