MiroMindM1

Models Data Paper Github Website

This repository contains the MiroMind-M1-RL-32B model, part of the MiroMind-M1 series, described in the paper MiroMind-M1: An Open-Source Advancement in Mathematical Reasoning via Context-Aware Multi-Stage Policy Optimization.

MiroMind-M1

🧾 Overview

7B Model Training Performance

Training performance of MiroMind-M1-RL-7B on AIME24 and AIME25.

MiroMind-M1 is a fully open-source series of reasoning language models built on Qwen-2.5, focused on advancing mathematical reasoning. It is trained through supervised fine-tuning (SFT) on 719K curated problems and reinforcement learning with verifiable rewards (RLVR) on 62K challenging examples, using a context-aware multi-stage policy optimization method (CAMPO). MiroMind-M1 achieves state-of-the-art performance among open-source 7B Qwen-2.5-based models on AIME24, AIME25, and MATH500, with all models (MiroMind-M1-SFT-7B, MiroMind-M1-RL-7B, MiroMind-M1-RL-32B), data (MiroMind-M1-SFT-719K, MiroMind-M1-RL-62K), and training setups openly released.

📊 Evaluation

MiroMind-M1-SFT

Model Initial Checkpoint AIME24 (avg@64) AIME25 (avg@64) MATH500 (avg@5)
DeepSeek-R1-Distill Qwen2.5-Math-7B 55.5 40.4† 92.8
OpenThoughts Qwen2.5-7-Instruct 31.3 23.3 83.2
Open-R1 Qwen2.5-Math-7B-Instruct 36.7 40.0 90.6
Synthetic-1 Qwen2.5-7B-Instruct 30.0 26.6 85.6
MiMo-7B-SFT MiMo-7B-Base 58.7 44.3 93.0
MiroMind-SFT-7B Qwen2.5-Math-7B 60.4 45.0 94.6

† means that the score of DeepSeek-R1 on AIME25 is from our evaluation.

MiroMind-M1-RL

Model AIME24 (avg@64) AIME25 (avg@64) MATH500 (avg@5)
DeepSeek-R1 79.8 70.0
DeepSeek-R1-0528 91.4 87.5
Qwen3-8B 76.0 67.3
DeepSeek-R1-0528-Qwen3-8B 86.0 76.3
MiMo-7B-RL 68.2 55.4 95.8
32B Models trained from Qwen2.5 series
DeepSeek-R1-Distill-Qwen-32B 70.8 52.1 95.8
Skywork-OR1-32B-Preview 77.1 68.2 97.5
MiroMind-M1-RL-32B 77.5 65.6 96.4
7B Models trained from Qwen2.5 series
DeepSeek-R1-Distill-Qwen-7B 55.5 39.2
MiroMind-M1-SFT-7B 60.4 45.0 94.6
Light-R1-7B-DS 59.1 44.3
Skywork-OR1-7B 72.2 54.6
MiroMind-M1-RL-7B 73.4 57.8 96.7

🔗 Resources

Models

MiroMind-M1-SFT-7B
MiroMind-M1-RL-7B
MiroMind-M1-RL-32B

Data

MiroMind-M1-SFT-719K
MiroMind-M1-RL-62K

🚀 Quickstart

You can explore the models using the Transformers library.

from transformers import AutoTokenizer, AutoModelForCausalLM
import torch

model_name = "miromind-ai/MiroMind-M1-RL-32B" # Or miromind-ai/MiroMind-M1-RL-7B
tokenizer = AutoTokenizer.from_pretrained(model_name, trust_remote_code=True)
model = AutoModelForCausalLM.from_pretrained(
    model_name,
    torch_dtype=torch.bfloat16,
    device_map="auto",
    trust_remote_code=True
)

prompt = "Given the equation $2x + 5 = 11$, what is the value of $x$?"
messages = [
    {"role": "user", "content": prompt}
]
text = tokenizer.apply_chat_template(
    messages,
    tokenize=False,
    add_generation_prompt=True
)

model_inputs = tokenizer([text], return_tensors="pt").to(model.device)

generated_ids = model.generate(
    model_inputs.input_ids,
    max_new_tokens=512
)
generated_ids = [
    output_ids[len(input_ids):] for input_ids, output_ids in zip(model_inputs.input_ids, generated_ids)
]

response = tokenizer.batch_decode(generated_ids, skip_special_tokens=True)[0]
print(response)

🛠 Getting Started

Installation

venv environment:

git clone https://github.com/MiroMindAsia/MiroMind-M1.git
cd MiroMind-M1

# Install Python 3.10 environment.
python3.10 -m pip install virtualenv
virtualenv -p python3.10 venv
source venv/bin/activate

# Install dependencies.
pip3 install torch==2.4.0 --index-url https://download.pytorch.org/whl/cu124
pip3 install numpy psutil ninja packaging cmake
pip3 install flash_attn==2.7.4.post1 --no-build-isolation # This may take a while...
pip3 install -e .

🏋️ Training

Multi-Node Training

Here is a quik guided to start Ray for multi-node training.

On the head node

ray stop
ray start --head --node-ip-address $HEAD_NODE_IP --num-gpus 8 --dashboard-host=0.0.0.0

On other nodes

ray stop
ray start --address="$HEAD_NODE_IP:6379" --num-gpus 8

Start Training

First, please provde the below variables:

export MODEL_PATH=YOUR_MODEL_PATH
export CKPTS_DIR=YOUR_CKPTS_DIR
export TRAIN_FILE=YOUR_TRAIN_FILE
export TEST_FILE=YOUR_TEST_FILE
export HOME=YOUR_HOME_PATH

Then run the below script to start the training:

bash m1_train_script/campo_32b.sh

⚖️ Run Evaluation

We provide ready-to-use evaluation scripts in the m1_eval_script/ directory for mathematical reasoning benchmarks.

Quick Start

# Evaluate on AIME 2024
bash m1_eval_script/evaluate_7b_aime24.sh

# Evaluate on AIME 2025  
bash m1_eval_script/evaluate_7b_aime25.sh

# Evaluate on Math-500
bash m1_eval_script/evaluate_7b_math500.sh

Supported Benchmarks

Dataset Script Standard Runs
AIME 2024 evaluate_7b_aime24.sh 64 runs
AIME 2025 evaluate_7b_aime25.sh 64 runs
Math-500 evaluate_7b_math500.sh 5 runs

Results

Results are saved in results/[model_name]/[dataset_name]/ with:

  • average_accuracy.txt: Final accuracy score
  • run[X]_inference_eval_results.csv: Detailed results

🙏 Acknowledgement

The RL trianing is built from the wonderful verl project.

Downloads last month
25
Safetensors
Model size
32.8B params
Tensor type
BF16
·
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for miromind-ai/MiroMind-M1-RL-32B

Finetuned
(76)
this model
Quantizations
3 models

Collection including miromind-ai/MiroMind-M1-RL-32B