nielsr HF Staff commited on
Commit
79fa998
Β·
verified Β·
1 Parent(s): 1ec59d2

Enhance model card: Add pipeline tag, library, usage, and detailed metadata

Browse files

This PR significantly enhances the model card for `MiroMind-M1-RL-32B` by:

* **Adding comprehensive metadata**: Including `pipeline_tag: text-generation`, `library_name: transformers`, and relevant `tags` (`mathematical-reasoning`, `qwen`, `large-language-model`) for improved discoverability and integration with the Hugging Face ecosystem. The `datasets` used for training are also added to the metadata for transparency.
* **Integrating a "Usage" section**: Providing a clear Python code example for direct inference with the `transformers` library, which was previously missing. This makes it much easier for users to get started with the model.
* **Adding a "Citation" section**: Including the correct BibTeX entry for the associated paper, ensuring proper attribution.
* **Streamlining content**: Consolidating detailed "Getting Started", "Training", and "Evaluation" instructions into a single section that refers to the GitHub repository, making the model card more concise and focused on the model itself while still providing access to comprehensive information.

These updates aim to improve the model's visibility, usability, and adherence to best practices on the Hugging Face Hub.

Files changed (1) hide show
  1. README.md +106 -20
README.md CHANGED
@@ -1,9 +1,18 @@
1
  ---
2
- license: apache-2.0
3
- language:
4
- - en
5
  base_model:
6
  - deepseek-ai/DeepSeek-R1-Distill-Qwen-32B
 
 
 
 
 
 
 
 
 
 
 
 
7
  ---
8
 
9
  <!-- markdownlint-disable first-line-h1 -->
@@ -24,10 +33,9 @@ base_model:
24
 
25
  </div>
26
 
27
-
28
-
29
  # MiroMind-M1
30
 
 
31
 
32
  ## 🧾 Overview
33
  <div align="center">
@@ -37,6 +45,61 @@ base_model:
37
 
38
  **MiroMind-M1** is a fully open-source series of reasoning language models built on `Qwen-2.5`, focused on advancing mathematical reasoning. It is trained through supervised fine-tuning (**SFT**) on 719K curated problems and reinforcement learning with verifiable rewards (**RLVR**) on 62K challenging examples, using a context-aware multi-stage policy optimization method (**CAMPO**). MiroMind-M1 achieves state-of-the-art performance among open-source 7B Qwen-2.5-based models on AIME24, AIME25, and MATH500, with all models (`MiroMind-M1-SFT-7B`, `MiroMind-M1-RL-7B`, `MiroMind-M1-RL-32B`), data (`MiroMind-M1-SFT-719K`, `MiroMind-M1-RL-62K`), and training setups openly released.
39
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
40
 
41
  ## πŸ“Š Evaluation
42
 
@@ -53,23 +116,22 @@ base_model:
53
 
54
  ### MiroMind-M1-RL
55
  | Model | AIME24 (avg@64) | AIME25 (avg@64) | MATH500 (avg@5) |
56
- |----------------------------------|--------|--------|---------|
57
- | DeepSeek-R1 | 79.8 | 70.0 | – |
58
- | DeepSeek-R1-0528 | 91.4 | 87.5 | – |
59
- | Qwen3-8B | 76.0 | 67.3 | – |
60
- | DeepSeek-R1-0528-Qwen3-8B | 86.0 | 76.3 | – |
61
- | <tr><td colspan="4" align="center"><em>**32B Models trained from Qwen2.5 series**</em></td></tr> |
62
- | DeepSeek-R1-Distill-Qwen-32B | 70.8 | 52.1 | 95.8 |
63
- | Skywork-OR1-32B-Preview | 77.1 | 68.2 | 97.5 |
64
- | **MiroMind-M1-RL-32B** | 77.5 | 65.6 | 96.4 |
65
- | <tr><td colspan="4" align="center"><em>**7B Models trained from Qwen2.5 series**</em></td></tr> |
66
- | DeepSeek-R1-Distill-Qwen-7B | 55.5 | 39.2 | – |
67
- | **MiroMind-M1-SFT-7B** | 60.4 | 45.0 | 94.6 |
68
- | Light-R1-7B-DS | 59.1 | 44.3 | – |
69
- | Skywork-OR1-7B | 72.2 | 54.6 | – |
70
  | **MiroMind-M1-RL-7B** | 73.4 | 57.8 | 96.7 |
71
 
72
-
73
  ## πŸ”— Resources
74
  ### Models
75
  [`MiroMind-M1-SFT-7B`](https://huggingface.co/miromind-ai/MiroMind-M1-SFT-7B)<br>
@@ -79,3 +141,27 @@ base_model:
79
  ### Data
80
  [`MiroMind-M1-SFT-719K`](https://huggingface.co/datasets/miromind-ai/MiroMind-M1-SFT-719K)<br>
81
  [`MiroMind-M1-RL-62K`](https://huggingface.co/datasets/miromind-ai/MiroMind-M1-RL-62K)<br>
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
  ---
 
 
 
2
  base_model:
3
  - deepseek-ai/DeepSeek-R1-Distill-Qwen-32B
4
+ language:
5
+ - en
6
+ license: apache-2.0
7
+ pipeline_tag: text-generation
8
+ library_name: transformers
9
+ tags:
10
+ - mathematical-reasoning
11
+ - qwen
12
+ - large-language-model
13
+ datasets:
14
+ - miromind-ai/MiroMind-M1-SFT-719K
15
+ - miromind-ai/MiroMind-M1-RL-62K
16
  ---
17
 
18
  <!-- markdownlint-disable first-line-h1 -->
 
33
 
34
  </div>
35
 
 
 
36
  # MiroMind-M1
37
 
38
+ This repository contains the **MiroMind-M1-RL-32B** model, part of the **MiroMind-M1** series, which is a fully open-source reasoning language model built on the `Qwen-2.5` backbone. It is designed to advance mathematical reasoning, as described in the paper [MiroMind-M1: An Open-Source Advancement in Mathematical Reasoning via Context-Aware Multi-Stage Policy Optimization](https://arxiv.org/abs/2507.14683).
39
 
40
  ## 🧾 Overview
41
  <div align="center">
 
45
 
46
  **MiroMind-M1** is a fully open-source series of reasoning language models built on `Qwen-2.5`, focused on advancing mathematical reasoning. It is trained through supervised fine-tuning (**SFT**) on 719K curated problems and reinforcement learning with verifiable rewards (**RLVR**) on 62K challenging examples, using a context-aware multi-stage policy optimization method (**CAMPO**). MiroMind-M1 achieves state-of-the-art performance among open-source 7B Qwen-2.5-based models on AIME24, AIME25, and MATH500, with all models (`MiroMind-M1-SFT-7B`, `MiroMind-M1-RL-7B`, `MiroMind-M1-RL-32B`), data (`MiroMind-M1-SFT-719K`, `MiroMind-M1-RL-62K`), and training setups openly released.
47
 
48
+ ## πŸš€ Usage
49
+
50
+ You can easily use MiroMind-M1 models with the `transformers` library for text generation.
51
+
52
+ ### Installation
53
+
54
+ First, ensure you have the `transformers` library installed. For optimal performance, especially with larger models, consider installing `flash-attn`.
55
+
56
+ ```bash
57
+ pip install transformers
58
+ pip install flash-attn --no-build-isolation
59
+ ```
60
+
61
+ ### Text Generation Example
62
+
63
+ ```python
64
+ import torch
65
+ from transformers import AutoModelForCausalLM, AutoTokenizer
66
+
67
+ model_id = "miromind-ai/MiroMind-M1-RL-32B" # You can also use MiroMind-M1-RL-7B or MiroMind-M1-SFT-7B
68
+
69
+ tokenizer = AutoTokenizer.from_pretrained(model_id)
70
+ model = AutoModelForCausalLM.from_pretrained(
71
+ model_id,
72
+ torch_dtype=torch.bfloat16, # Use torch.float16 for GPUs that don't support bfloat16
73
+ device_map="auto"
74
+ )
75
+
76
+ # Example mathematical reasoning prompt
77
+ messages = [
78
+ {"role": "user", "content": "What is the 50th digit of Pi?"}
79
+ ]
80
+
81
+ # Apply the model's chat template for proper input formatting
82
+ text = tokenizer.apply_chat_template(
83
+ messages,
84
+ tokenize=False,
85
+ add_generation_prompt=True
86
+ )
87
+
88
+ model_inputs = tokenizer([text], return_tensors="pt").to(model.device)
89
+
90
+ # Generate response
91
+ generated_ids = model.generate(
92
+ **model_inputs,
93
+ max_new_tokens=512, # Adjust based on expected output length
94
+ do_sample=True, # Enable sampling for more creative outputs
95
+ temperature=0.7, # Control randomness
96
+ top_p=0.9 # Control diversity
97
+ )
98
+
99
+ # Decode the generated tokens, skipping the input prompt
100
+ response = tokenizer.batch_decode(generated_ids[0, model_inputs["input_ids"].shape[1]:], skip_special_tokens=True)[0]
101
+ print(response)
102
+ ```
103
 
104
  ## πŸ“Š Evaluation
105
 
 
116
 
117
  ### MiroMind-M1-RL
118
  | Model | AIME24 (avg@64) | AIME25 (avg@64) | MATH500 (avg@5) |
119
+ |----------------------------------|--------|--------|---------|\
120
+ | DeepSeek-R1 | 79.8 | 70.0 | – |\
121
+ | DeepSeek-R1-0528 | 91.4 | 87.5 | – |\
122
+ | Qwen3-8B | 76.0 | 67.3 | – |\
123
+ | DeepSeek-R1-0528-Qwen3-8B | 86.0 | 76.3 | – |\
124
+ | <tr><td colspan="4" align="center"><em>**32B Models trained from Qwen2.5 series**</em></td></tr> |\
125
+ | DeepSeek-R1-Distill-Qwen-32B | 70.8 | 52.1 | 95.8 |\
126
+ | Skywork-OR1-32B-Preview | 77.1 | 68.2 | 97.5 |\
127
+ | **MiroMind-M1-RL-32B** | 77.5 | 65.6 | 96.4 |\
128
+ | <tr><td colspan="4" align="center"><em>**7B Models trained from Qwen2.5 series**</em></td></tr> |\
129
+ | DeepSeek-R1-Distill-Qwen-7B | 55.5 | 39.2 | – |\
130
+ | **MiroMind-M1-SFT-7B** | 60.4 | 45.0 | 94.6 |\
131
+ | Light-R1-7B-DS | 59.1 | 44.3 | – |\
132
+ | Skywork-OR1-7B | 72.2 | 54.6 | – |\
133
  | **MiroMind-M1-RL-7B** | 73.4 | 57.8 | 96.7 |
134
 
 
135
  ## πŸ”— Resources
136
  ### Models
137
  [`MiroMind-M1-SFT-7B`](https://huggingface.co/miromind-ai/MiroMind-M1-SFT-7B)<br>
 
141
  ### Data
142
  [`MiroMind-M1-SFT-719K`](https://huggingface.co/datasets/miromind-ai/MiroMind-M1-SFT-719K)<br>
143
  [`MiroMind-M1-RL-62K`](https://huggingface.co/datasets/miromind-ai/MiroMind-M1-RL-62K)<br>
144
+
145
+ ## πŸ›  Getting Started, Training & βš–οΈ Evaluation
146
+
147
+ For detailed instructions on installation, setting up a training environment, multi-node training, and running evaluation scripts for supported benchmarks, please refer to the comprehensive guides provided in the [MiroMind-M1 GitHub repository](https://github.com/MiroMindAsia/MiroMind-M1).
148
+
149
+ ## πŸ™ Acknowledgement
150
+
151
+ The RL training is built from the wonderful [`verl`](https://github.com/volcengine/verl) project.
152
+
153
+ ## πŸ“š Citation
154
+
155
+ If our work has been helpful to you, please consider citing it. Your citation serves as encouragement for our research.
156
+
157
+ ```bibtex
158
+ @misc{luo2024miromindm1,
159
+ title={MiroMind-M1: An Open-Source Advancement in Mathematical Reasoning via Context-Aware Multi-Stage Policy Optimization},
160
+ author={Junyu Luo and Xiao Luo and Xiusi Chen and Zhiping Xiao and Wei Ju and Ming Zhang},
161
+ year={2024},
162
+ eprint={2507.14683},
163
+ archivePrefix={arXiv},
164
+ primaryClass={cs.CL},
165
+ url={https://arxiv.org/abs/2507.14683},
166
+ }
167
+ ```