Improve model card: Add license, pipeline tag, library name, and extended content

#1
by nielsr HF Staff - opened
Files changed (1) hide show
  1. README.md +51 -3
README.md CHANGED
@@ -1,9 +1,13 @@
1
  ---
2
- datasets:
3
- - OpenMMReasoner/OpenMMReasoner-RL-74K
4
  base_model:
5
  - Qwen/Qwen2.5-VL-7B-Instruct
 
 
 
 
 
6
  ---
 
7
  # OpenMMReasoner: Pushing the Frontiers for Multimodal Reasoning with an Open and General Recipe
8
 
9
  <div align="center">
@@ -81,4 +85,48 @@ output_text = processor.batch_decode(
81
  generated_ids_trimmed, skip_special_tokens=True, clean_up_tokenization_spaces=False
82
  )
83
  print(output_text)
84
- ```
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
  ---
 
 
2
  base_model:
3
  - Qwen/Qwen2.5-VL-7B-Instruct
4
+ datasets:
5
+ - OpenMMReasoner/OpenMMReasoner-RL-74K
6
+ license: apache-2.0
7
+ library_name: transformers
8
+ pipeline_tag: image-text-to-text
9
  ---
10
+
11
  # OpenMMReasoner: Pushing the Frontiers for Multimodal Reasoning with an Open and General Recipe
12
 
13
  <div align="center">
 
85
  generated_ids_trimmed, skip_special_tokens=True, clean_up_tokenization_spaces=False
86
  )
87
  print(output_text)
88
+ ```
89
+
90
+ ## Evaluation Results
91
+
92
+ Our **OpenMMReasoner-7B (OMR-7B)** model demonstrates strong performance across a comprehensive suite of multimodal reasoning benchmarks. With only 874K SFT samples and 74K RL samples—significantly less data than many competing methods—our model achieves state-of-the-art or highly competitive results on 9 out of 14 benchmark tasks. Notably, OMR-7B achieves **79.5%** on MathVista testmini (best among all models), **63.8%** on MathVerse testmini (best), and **79.0%** on WeMath loose (best), demonstrating the effectiveness of our transparent two-stage training recipe. This performance validates our emphasis on data quality and rigorous training design over simply scaling dataset size.
93
+
94
+ | Model | SFT Data | RL Data | MathVista<br/>testmini | MathVision<br/>test | MathVision<br/>testmini | MathVerse<br/>testmini | DynaMath<br/>worst | WeMath<br/>loose | LogicVista<br/>test | MMMU<br/>val | MMMU-Pro<br/>standard | MMMU-Pro<br/>vision | CharXiv<br/>reas. | CharXiv<br/>desc. |
95
+ |-------|----------|---------|------------------------|---------------------|-------------------------|------------------------|--------------------|--------------------|---------------------|--------------|-----------------------|---------------------|-------------------|-------------------|
96
+ | VLAA-Thinker-Qwen2.5-7B | 126k | 25k | 68.0 | 26.4 | - | 48.2 | 22.4 | - | 48.5 | - | - | - | - | - |
97
+ | ThinkLite-7B-VL | - | 11k | 71.6 | 24.6 | - | 42.9 | 16.5 | - | 42.7 | - | - | - | - | - |
98
+ | VL-Rethinker-7B | - | 39k | 73.7 | 28.4 | - | 46.4 | 17.8 | - | 42.7 | - | 41.7 | - | - | - |
99
+ | M2-Reasoning | 6.2M | 102k | 75.0 | 42.1 | - | 40.4 | - | - | 50.6 | - | - | - | - | - |
100
+ | MMR1 | 1.6M | 15k | 72.0 | 31.8 | 29.0† | 55.4 | 27.9† | 68.0† | 48.9 | 52.4† | 41.1† | 37.1† | 43.5† | 71.1† |
101
+ | OpenVLThinker-7B | 3.3k | 9.6k | 65.3 | 23.0 | 26.9† | 38.1 | 16.8 | 61.9† | 44.5 | 55.1† | 39.7† | 38.4† | 41.0† | 69.2† |
102
+ | MM-Eureka-Qwen-7B | - | 15.6k | 72.6 | 28.1 | 32.1† | 45.4 | 23.0 | 59.8† | 46.3 | 54.4† | 40.1† | 37.1† | 42.4† | 74.1† |
103
+ | OVR-7B | 2M | 300k | 72.1 | **51.8** | 38.2† | 54.6 | 33.5 | 64.8 | **54.8** | 51.8† | **50.2** | 29.1† | 44.5 | 73.6 |
104
+ | **OMR-7B (ours)** | **874k** | **74k** | **79.5** | 43.6 | **38.8** | **63.8** | **34.9** | **79.0** | 50.0 | **57.8** | 44.1 | **40.6** | **46.1** | 73.5 |
105
+
106
+ **Note:** Bold numbers indicate the best performance, and † indicates results reproduced using the authors' checkpoints.
107
+
108
+ ## Citation
109
+
110
+ If you find OpenMMReasoner useful for your research and applications, please cite using this BibTeX:
111
+
112
+ ```bibtex
113
+ @misc{zhang2025openmmreasonerpushingfrontiersmultimodal,
114
+ title={OpenMMReasoner: Pushing the Frontiers for Multimodal Reasoning with an Open and General Recipe},
115
+ author={Kaichen Zhang and Keming Wu and Zuhao Yang and Kairui Hu and Bin Wang and Ziwei Liu and Xingxuan Li and Lidong Bing},
116
+ year={2025},
117
+ eprint={2511.16334},
118
+ archivePrefix={arXiv},
119
+ primaryClass={cs.AI},
120
+ url={https://arxiv.org/abs/2511.16334},
121
+ }
122
+ ```
123
+
124
+ ## Acknowledgements
125
+
126
+ We gratefully acknowledge the following open-source projects that made this work possible:
127
+
128
+ - [**lmms-eval**](https://github.com/EvolvingLMMs-Lab/lmms-eval) for providing the comprehensive evaluation framework for large multimodal models.
129
+ - [**lmms-engine**](https://github.com/EvolvingLMMs-Lab/lmms-engine) for the SFT training infrastructure and tools.
130
+ - [**verl**](https://github.com/volcengine/verl) for the reinforcement learning training framework.
131
+
132
+ We thank the developers and contributors of these projects for their excellent work and for making their code publicly available.