metadata

datasets:
  - URSA-MATH/MMathCoT-1M
language:
  - en
  - zh
license: apache-2.0
library_name: transformers
pipeline_tag: image-text-to-text

URSA-8B-PS-GRPO

URSA-8B-PS-GRPO employs process-supervision grpo which proposed in our paper.

Installation

from huggingface_hub import snapshot_download

repo_id = "URSA-MATH/URSA-8B-PS-GRPO"
local_dir = YOUR_LOCAL_PATH  

snapshot_path = snapshot_download(
    repo_id=repo_id,
    local_dir=local_dir,
    revision="main", 
    cache_dir=None, 
)

Inference

We have adapted vLLM for URSA-8B. Please refer to the GitHub repository for quick inference implementation.

Besides, we have adapted evaluation on VLMEvalKit!

Citation

If you find our paper, model, or data helpful, please give this repo a star 🌟 and cite our article ✏️.

@article{luo2025ursa,
  title={URSA: Understanding and Verifying Chain-of-thought Reasoning in Multimodal Mathematics},
  author={Luo, Ruilin and Zheng, Zhuofan and Wang, Yifan and Yu, Yiyao and Ni, Xinzhe and Lin, Zicheng and Zeng, Jin and Yang, Yujiu},
  journal={arXiv preprint arXiv:2501.04686},
  year={2025}
}