Improve model card: Add license, pipeline tag, library name, and extended content
#1
by
nielsr
HF Staff
- opened
README.md
CHANGED
|
@@ -1,9 +1,13 @@
|
|
| 1 |
---
|
| 2 |
-
datasets:
|
| 3 |
-
- OpenMMReasoner/OpenMMReasoner-RL-74K
|
| 4 |
base_model:
|
| 5 |
- Qwen/Qwen2.5-VL-7B-Instruct
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 6 |
---
|
|
|
|
| 7 |
# OpenMMReasoner: Pushing the Frontiers for Multimodal Reasoning with an Open and General Recipe
|
| 8 |
|
| 9 |
<div align="center">
|
|
@@ -81,4 +85,48 @@ output_text = processor.batch_decode(
|
|
| 81 |
generated_ids_trimmed, skip_special_tokens=True, clean_up_tokenization_spaces=False
|
| 82 |
)
|
| 83 |
print(output_text)
|
| 84 |
-
```
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
---
|
|
|
|
|
|
|
| 2 |
base_model:
|
| 3 |
- Qwen/Qwen2.5-VL-7B-Instruct
|
| 4 |
+
datasets:
|
| 5 |
+
- OpenMMReasoner/OpenMMReasoner-RL-74K
|
| 6 |
+
license: apache-2.0
|
| 7 |
+
library_name: transformers
|
| 8 |
+
pipeline_tag: image-text-to-text
|
| 9 |
---
|
| 10 |
+
|
| 11 |
# OpenMMReasoner: Pushing the Frontiers for Multimodal Reasoning with an Open and General Recipe
|
| 12 |
|
| 13 |
<div align="center">
|
|
|
|
| 85 |
generated_ids_trimmed, skip_special_tokens=True, clean_up_tokenization_spaces=False
|
| 86 |
)
|
| 87 |
print(output_text)
|
| 88 |
+
```
|
| 89 |
+
|
| 90 |
+
## Evaluation Results
|
| 91 |
+
|
| 92 |
+
Our **OpenMMReasoner-7B (OMR-7B)** model demonstrates strong performance across a comprehensive suite of multimodal reasoning benchmarks. With only 874K SFT samples and 74K RL samples—significantly less data than many competing methods—our model achieves state-of-the-art or highly competitive results on 9 out of 14 benchmark tasks. Notably, OMR-7B achieves **79.5%** on MathVista testmini (best among all models), **63.8%** on MathVerse testmini (best), and **79.0%** on WeMath loose (best), demonstrating the effectiveness of our transparent two-stage training recipe. This performance validates our emphasis on data quality and rigorous training design over simply scaling dataset size.
|
| 93 |
+
|
| 94 |
+
| Model | SFT Data | RL Data | MathVista<br/>testmini | MathVision<br/>test | MathVision<br/>testmini | MathVerse<br/>testmini | DynaMath<br/>worst | WeMath<br/>loose | LogicVista<br/>test | MMMU<br/>val | MMMU-Pro<br/>standard | MMMU-Pro<br/>vision | CharXiv<br/>reas. | CharXiv<br/>desc. |
|
| 95 |
+
|-------|----------|---------|------------------------|---------------------|-------------------------|------------------------|--------------------|--------------------|---------------------|--------------|-----------------------|---------------------|-------------------|-------------------|
|
| 96 |
+
| VLAA-Thinker-Qwen2.5-7B | 126k | 25k | 68.0 | 26.4 | - | 48.2 | 22.4 | - | 48.5 | - | - | - | - | - |
|
| 97 |
+
| ThinkLite-7B-VL | - | 11k | 71.6 | 24.6 | - | 42.9 | 16.5 | - | 42.7 | - | - | - | - | - |
|
| 98 |
+
| VL-Rethinker-7B | - | 39k | 73.7 | 28.4 | - | 46.4 | 17.8 | - | 42.7 | - | 41.7 | - | - | - |
|
| 99 |
+
| M2-Reasoning | 6.2M | 102k | 75.0 | 42.1 | - | 40.4 | - | - | 50.6 | - | - | - | - | - |
|
| 100 |
+
| MMR1 | 1.6M | 15k | 72.0 | 31.8 | 29.0† | 55.4 | 27.9† | 68.0† | 48.9 | 52.4† | 41.1† | 37.1† | 43.5† | 71.1† |
|
| 101 |
+
| OpenVLThinker-7B | 3.3k | 9.6k | 65.3 | 23.0 | 26.9† | 38.1 | 16.8 | 61.9† | 44.5 | 55.1† | 39.7† | 38.4† | 41.0† | 69.2† |
|
| 102 |
+
| MM-Eureka-Qwen-7B | - | 15.6k | 72.6 | 28.1 | 32.1† | 45.4 | 23.0 | 59.8† | 46.3 | 54.4† | 40.1† | 37.1† | 42.4† | 74.1† |
|
| 103 |
+
| OVR-7B | 2M | 300k | 72.1 | **51.8** | 38.2† | 54.6 | 33.5 | 64.8 | **54.8** | 51.8† | **50.2** | 29.1† | 44.5 | 73.6 |
|
| 104 |
+
| **OMR-7B (ours)** | **874k** | **74k** | **79.5** | 43.6 | **38.8** | **63.8** | **34.9** | **79.0** | 50.0 | **57.8** | 44.1 | **40.6** | **46.1** | 73.5 |
|
| 105 |
+
|
| 106 |
+
**Note:** Bold numbers indicate the best performance, and † indicates results reproduced using the authors' checkpoints.
|
| 107 |
+
|
| 108 |
+
## Citation
|
| 109 |
+
|
| 110 |
+
If you find OpenMMReasoner useful for your research and applications, please cite using this BibTeX:
|
| 111 |
+
|
| 112 |
+
```bibtex
|
| 113 |
+
@misc{zhang2025openmmreasonerpushingfrontiersmultimodal,
|
| 114 |
+
title={OpenMMReasoner: Pushing the Frontiers for Multimodal Reasoning with an Open and General Recipe},
|
| 115 |
+
author={Kaichen Zhang and Keming Wu and Zuhao Yang and Kairui Hu and Bin Wang and Ziwei Liu and Xingxuan Li and Lidong Bing},
|
| 116 |
+
year={2025},
|
| 117 |
+
eprint={2511.16334},
|
| 118 |
+
archivePrefix={arXiv},
|
| 119 |
+
primaryClass={cs.AI},
|
| 120 |
+
url={https://arxiv.org/abs/2511.16334},
|
| 121 |
+
}
|
| 122 |
+
```
|
| 123 |
+
|
| 124 |
+
## Acknowledgements
|
| 125 |
+
|
| 126 |
+
We gratefully acknowledge the following open-source projects that made this work possible:
|
| 127 |
+
|
| 128 |
+
- [**lmms-eval**](https://github.com/EvolvingLMMs-Lab/lmms-eval) for providing the comprehensive evaluation framework for large multimodal models.
|
| 129 |
+
- [**lmms-engine**](https://github.com/EvolvingLMMs-Lab/lmms-engine) for the SFT training infrastructure and tools.
|
| 130 |
+
- [**verl**](https://github.com/volcengine/verl) for the reinforcement learning training framework.
|
| 131 |
+
|
| 132 |
+
We thank the developers and contributors of these projects for their excellent work and for making their code publicly available.
|