Enhance model card: Add pipeline tag, library, usage, and detailed metadata
Browse filesThis PR significantly enhances the model card for `MiroMind-M1-RL-32B` by:
* **Adding comprehensive metadata**: Including `pipeline_tag: text-generation`, `library_name: transformers`, and relevant `tags` (`mathematical-reasoning`, `qwen`, `large-language-model`) for improved discoverability and integration with the Hugging Face ecosystem. The `datasets` used for training are also added to the metadata for transparency.
* **Integrating a "Usage" section**: Providing a clear Python code example for direct inference with the `transformers` library, which was previously missing. This makes it much easier for users to get started with the model.
* **Adding a "Citation" section**: Including the correct BibTeX entry for the associated paper, ensuring proper attribution.
* **Streamlining content**: Consolidating detailed "Getting Started", "Training", and "Evaluation" instructions into a single section that refers to the GitHub repository, making the model card more concise and focused on the model itself while still providing access to comprehensive information.
These updates aim to improve the model's visibility, usability, and adherence to best practices on the Hugging Face Hub.
@@ -1,9 +1,18 @@
|
|
1 |
---
|
2 |
-
license: apache-2.0
|
3 |
-
language:
|
4 |
-
- en
|
5 |
base_model:
|
6 |
- deepseek-ai/DeepSeek-R1-Distill-Qwen-32B
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
7 |
---
|
8 |
|
9 |
<!-- markdownlint-disable first-line-h1 -->
|
@@ -24,10 +33,9 @@ base_model:
|
|
24 |
|
25 |
</div>
|
26 |
|
27 |
-
|
28 |
-
|
29 |
# MiroMind-M1
|
30 |
|
|
|
31 |
|
32 |
## π§Ύ Overview
|
33 |
<div align="center">
|
@@ -37,6 +45,61 @@ base_model:
|
|
37 |
|
38 |
**MiroMind-M1** is a fully open-source series of reasoning language models built on `Qwen-2.5`, focused on advancing mathematical reasoning. It is trained through supervised fine-tuning (**SFT**) on 719K curated problems and reinforcement learning with verifiable rewards (**RLVR**) on 62K challenging examples, using a context-aware multi-stage policy optimization method (**CAMPO**). MiroMind-M1 achieves state-of-the-art performance among open-source 7B Qwen-2.5-based models on AIME24, AIME25, and MATH500, with all models (`MiroMind-M1-SFT-7B`, `MiroMind-M1-RL-7B`, `MiroMind-M1-RL-32B`), data (`MiroMind-M1-SFT-719K`, `MiroMind-M1-RL-62K`), and training setups openly released.
|
39 |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
40 |
|
41 |
## π Evaluation
|
42 |
|
@@ -53,23 +116,22 @@ base_model:
|
|
53 |
|
54 |
### MiroMind-M1-RL
|
55 |
| Model | AIME24 (avg@64) | AIME25 (avg@64) | MATH500 (avg@5) |
|
56 |
-
|
57 |
-
| DeepSeek-R1 | 79.8 | 70.0 | β
|
58 |
-
| DeepSeek-R1-0528 | 91.4 | 87.5 | β
|
59 |
-
| Qwen3-8B | 76.0 | 67.3 | β
|
60 |
-
| DeepSeek-R1-0528-Qwen3-8B | 86.0 | 76.3 | β
|
61 |
-
| <tr><td colspan="4" align="center"><em>**32B Models trained from Qwen2.5 series**</em></td></tr>
|
62 |
-
| DeepSeek-R1-Distill-Qwen-32B | 70.8 | 52.1 | 95.8
|
63 |
-
| Skywork-OR1-32B-Preview | 77.1 | 68.2 | 97.5
|
64 |
-
| **MiroMind-M1-RL-32B** | 77.5 | 65.6 | 96.4
|
65 |
-
| <tr><td colspan="4" align="center"><em>**7B Models trained from Qwen2.5 series**</em></td></tr>
|
66 |
-
| DeepSeek-R1-Distill-Qwen-7B | 55.5 | 39.2 | β
|
67 |
-
| **MiroMind-M1-SFT-7B** | 60.4 | 45.0 | 94.6
|
68 |
-
| Light-R1-7B-DS | 59.1 | 44.3 | β
|
69 |
-
| Skywork-OR1-7B | 72.2 | 54.6 | β
|
70 |
| **MiroMind-M1-RL-7B** | 73.4 | 57.8 | 96.7 |
|
71 |
|
72 |
-
|
73 |
## π Resources
|
74 |
### Models
|
75 |
[`MiroMind-M1-SFT-7B`](https://huggingface.co/miromind-ai/MiroMind-M1-SFT-7B)<br>
|
@@ -79,3 +141,27 @@ base_model:
|
|
79 |
### Data
|
80 |
[`MiroMind-M1-SFT-719K`](https://huggingface.co/datasets/miromind-ai/MiroMind-M1-SFT-719K)<br>
|
81 |
[`MiroMind-M1-RL-62K`](https://huggingface.co/datasets/miromind-ai/MiroMind-M1-RL-62K)<br>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
1 |
---
|
|
|
|
|
|
|
2 |
base_model:
|
3 |
- deepseek-ai/DeepSeek-R1-Distill-Qwen-32B
|
4 |
+
language:
|
5 |
+
- en
|
6 |
+
license: apache-2.0
|
7 |
+
pipeline_tag: text-generation
|
8 |
+
library_name: transformers
|
9 |
+
tags:
|
10 |
+
- mathematical-reasoning
|
11 |
+
- qwen
|
12 |
+
- large-language-model
|
13 |
+
datasets:
|
14 |
+
- miromind-ai/MiroMind-M1-SFT-719K
|
15 |
+
- miromind-ai/MiroMind-M1-RL-62K
|
16 |
---
|
17 |
|
18 |
<!-- markdownlint-disable first-line-h1 -->
|
|
|
33 |
|
34 |
</div>
|
35 |
|
|
|
|
|
36 |
# MiroMind-M1
|
37 |
|
38 |
+
This repository contains the **MiroMind-M1-RL-32B** model, part of the **MiroMind-M1** series, which is a fully open-source reasoning language model built on the `Qwen-2.5` backbone. It is designed to advance mathematical reasoning, as described in the paper [MiroMind-M1: An Open-Source Advancement in Mathematical Reasoning via Context-Aware Multi-Stage Policy Optimization](https://arxiv.org/abs/2507.14683).
|
39 |
|
40 |
## π§Ύ Overview
|
41 |
<div align="center">
|
|
|
45 |
|
46 |
**MiroMind-M1** is a fully open-source series of reasoning language models built on `Qwen-2.5`, focused on advancing mathematical reasoning. It is trained through supervised fine-tuning (**SFT**) on 719K curated problems and reinforcement learning with verifiable rewards (**RLVR**) on 62K challenging examples, using a context-aware multi-stage policy optimization method (**CAMPO**). MiroMind-M1 achieves state-of-the-art performance among open-source 7B Qwen-2.5-based models on AIME24, AIME25, and MATH500, with all models (`MiroMind-M1-SFT-7B`, `MiroMind-M1-RL-7B`, `MiroMind-M1-RL-32B`), data (`MiroMind-M1-SFT-719K`, `MiroMind-M1-RL-62K`), and training setups openly released.
|
47 |
|
48 |
+
## π Usage
|
49 |
+
|
50 |
+
You can easily use MiroMind-M1 models with the `transformers` library for text generation.
|
51 |
+
|
52 |
+
### Installation
|
53 |
+
|
54 |
+
First, ensure you have the `transformers` library installed. For optimal performance, especially with larger models, consider installing `flash-attn`.
|
55 |
+
|
56 |
+
```bash
|
57 |
+
pip install transformers
|
58 |
+
pip install flash-attn --no-build-isolation
|
59 |
+
```
|
60 |
+
|
61 |
+
### Text Generation Example
|
62 |
+
|
63 |
+
```python
|
64 |
+
import torch
|
65 |
+
from transformers import AutoModelForCausalLM, AutoTokenizer
|
66 |
+
|
67 |
+
model_id = "miromind-ai/MiroMind-M1-RL-32B" # You can also use MiroMind-M1-RL-7B or MiroMind-M1-SFT-7B
|
68 |
+
|
69 |
+
tokenizer = AutoTokenizer.from_pretrained(model_id)
|
70 |
+
model = AutoModelForCausalLM.from_pretrained(
|
71 |
+
model_id,
|
72 |
+
torch_dtype=torch.bfloat16, # Use torch.float16 for GPUs that don't support bfloat16
|
73 |
+
device_map="auto"
|
74 |
+
)
|
75 |
+
|
76 |
+
# Example mathematical reasoning prompt
|
77 |
+
messages = [
|
78 |
+
{"role": "user", "content": "What is the 50th digit of Pi?"}
|
79 |
+
]
|
80 |
+
|
81 |
+
# Apply the model's chat template for proper input formatting
|
82 |
+
text = tokenizer.apply_chat_template(
|
83 |
+
messages,
|
84 |
+
tokenize=False,
|
85 |
+
add_generation_prompt=True
|
86 |
+
)
|
87 |
+
|
88 |
+
model_inputs = tokenizer([text], return_tensors="pt").to(model.device)
|
89 |
+
|
90 |
+
# Generate response
|
91 |
+
generated_ids = model.generate(
|
92 |
+
**model_inputs,
|
93 |
+
max_new_tokens=512, # Adjust based on expected output length
|
94 |
+
do_sample=True, # Enable sampling for more creative outputs
|
95 |
+
temperature=0.7, # Control randomness
|
96 |
+
top_p=0.9 # Control diversity
|
97 |
+
)
|
98 |
+
|
99 |
+
# Decode the generated tokens, skipping the input prompt
|
100 |
+
response = tokenizer.batch_decode(generated_ids[0, model_inputs["input_ids"].shape[1]:], skip_special_tokens=True)[0]
|
101 |
+
print(response)
|
102 |
+
```
|
103 |
|
104 |
## π Evaluation
|
105 |
|
|
|
116 |
|
117 |
### MiroMind-M1-RL
|
118 |
| Model | AIME24 (avg@64) | AIME25 (avg@64) | MATH500 (avg@5) |
|
119 |
+
|----------------------------------|--------|--------|---------|\
|
120 |
+
| DeepSeek-R1 | 79.8 | 70.0 | β |\
|
121 |
+
| DeepSeek-R1-0528 | 91.4 | 87.5 | β |\
|
122 |
+
| Qwen3-8B | 76.0 | 67.3 | β |\
|
123 |
+
| DeepSeek-R1-0528-Qwen3-8B | 86.0 | 76.3 | β |\
|
124 |
+
| <tr><td colspan="4" align="center"><em>**32B Models trained from Qwen2.5 series**</em></td></tr> |\
|
125 |
+
| DeepSeek-R1-Distill-Qwen-32B | 70.8 | 52.1 | 95.8 |\
|
126 |
+
| Skywork-OR1-32B-Preview | 77.1 | 68.2 | 97.5 |\
|
127 |
+
| **MiroMind-M1-RL-32B** | 77.5 | 65.6 | 96.4 |\
|
128 |
+
| <tr><td colspan="4" align="center"><em>**7B Models trained from Qwen2.5 series**</em></td></tr> |\
|
129 |
+
| DeepSeek-R1-Distill-Qwen-7B | 55.5 | 39.2 | β |\
|
130 |
+
| **MiroMind-M1-SFT-7B** | 60.4 | 45.0 | 94.6 |\
|
131 |
+
| Light-R1-7B-DS | 59.1 | 44.3 | β |\
|
132 |
+
| Skywork-OR1-7B | 72.2 | 54.6 | β |\
|
133 |
| **MiroMind-M1-RL-7B** | 73.4 | 57.8 | 96.7 |
|
134 |
|
|
|
135 |
## π Resources
|
136 |
### Models
|
137 |
[`MiroMind-M1-SFT-7B`](https://huggingface.co/miromind-ai/MiroMind-M1-SFT-7B)<br>
|
|
|
141 |
### Data
|
142 |
[`MiroMind-M1-SFT-719K`](https://huggingface.co/datasets/miromind-ai/MiroMind-M1-SFT-719K)<br>
|
143 |
[`MiroMind-M1-RL-62K`](https://huggingface.co/datasets/miromind-ai/MiroMind-M1-RL-62K)<br>
|
144 |
+
|
145 |
+
## π Getting Started, Training & βοΈ Evaluation
|
146 |
+
|
147 |
+
For detailed instructions on installation, setting up a training environment, multi-node training, and running evaluation scripts for supported benchmarks, please refer to the comprehensive guides provided in the [MiroMind-M1 GitHub repository](https://github.com/MiroMindAsia/MiroMind-M1).
|
148 |
+
|
149 |
+
## π Acknowledgement
|
150 |
+
|
151 |
+
The RL training is built from the wonderful [`verl`](https://github.com/volcengine/verl) project.
|
152 |
+
|
153 |
+
## π Citation
|
154 |
+
|
155 |
+
If our work has been helpful to you, please consider citing it. Your citation serves as encouragement for our research.
|
156 |
+
|
157 |
+
```bibtex
|
158 |
+
@misc{luo2024miromindm1,
|
159 |
+
title={MiroMind-M1: An Open-Source Advancement in Mathematical Reasoning via Context-Aware Multi-Stage Policy Optimization},
|
160 |
+
author={Junyu Luo and Xiao Luo and Xiusi Chen and Zhiping Xiao and Wei Ju and Ming Zhang},
|
161 |
+
year={2024},
|
162 |
+
eprint={2507.14683},
|
163 |
+
archivePrefix={arXiv},
|
164 |
+
primaryClass={cs.CL},
|
165 |
+
url={https://arxiv.org/abs/2507.14683},
|
166 |
+
}
|
167 |
+
```
|