This repository contains the MiroMind-M1-RL-32B model, part of the MiroMind-M1 series, described in the paper MiroMind-M1: An Open-Source Advancement in Mathematical Reasoning via Context-Aware Multi-Stage Policy Optimization.

MiroMind-M1

🧾 Overview

Training performance of MiroMind-M1-RL-7B on AIME24 and AIME25.

MiroMind-M1 is a fully open-source series of reasoning language models built on Qwen-2.5, focused on advancing mathematical reasoning. It is trained through supervised fine-tuning (SFT) on 719K curated problems and reinforcement learning with verifiable rewards (RLVR) on 62K challenging examples, using a context-aware multi-stage policy optimization method (CAMPO). MiroMind-M1 achieves state-of-the-art performance among open-source 7B Qwen-2.5-based models on AIME24, AIME25, and MATH500, with all models (MiroMind-M1-SFT-7B, MiroMind-M1-RL-7B, MiroMind-M1-RL-32B), data (MiroMind-M1-SFT-719K, MiroMind-M1-RL-62K), and training setups openly released.

📊 Evaluation

MiroMind-M1-SFT

Model	Initial Checkpoint	AIME24 (avg@64)	AIME25 (avg@64)	MATH500 (avg@5)
DeepSeek-R1-Distill	Qwen2.5-Math-7B	55.5	40.4†	92.8
OpenThoughts	Qwen2.5-7-Instruct	31.3	23.3	83.2
Open-R1	Qwen2.5-Math-7B-Instruct	36.7	40.0	90.6
Synthetic-1	Qwen2.5-7B-Instruct	30.0	26.6	85.6
MiMo-7B-SFT	MiMo-7B-Base	58.7	44.3	93.0
MiroMind-SFT-7B	Qwen2.5-Math-7B	60.4	45.0	94.6

† means that the score of DeepSeek-R1 on AIME25 is from our evaluation.

MiroMind-M1-RL

Model	AIME24 (avg@64)	AIME25 (avg@64)	MATH500 (avg@5)
DeepSeek-R1	79.8	70.0	–
DeepSeek-R1-0528	91.4	87.5	–
Qwen3-8B	76.0	67.3	–
DeepSeek-R1-0528-Qwen3-8B	86.0	76.3	–
MiMo-7B-RL	68.2	55.4	95.8

32B Models trained from Qwen2.5 series

DeepSeek-R1-Distill-Qwen-32B	70.8	52.1	95.8
Skywork-OR1-32B-Preview	77.1	68.2	97.5
MiroMind-M1-RL-32B	77.5	65.6	96.4

7B Models trained from Qwen2.5 series

DeepSeek-R1-Distill-Qwen-7B	55.5	39.2	–
MiroMind-M1-SFT-7B	60.4	45.0	94.6
Light-R1-7B-DS	59.1	44.3	–
Skywork-OR1-7B	72.2	54.6	–
MiroMind-M1-RL-7B	73.4	57.8	96.7

🔗 Resources

Models

MiroMind-M1-SFT-7B
MiroMind-M1-RL-7B
MiroMind-M1-RL-32B

Data

MiroMind-M1-SFT-719K
MiroMind-M1-RL-62K

🚀 Quickstart

You can explore the models using the Transformers library.

from transformers import AutoTokenizer, AutoModelForCausalLM
import torch

model_name = "miromind-ai/MiroMind-M1-RL-32B" # Or miromind-ai/MiroMind-M1-RL-7B
tokenizer = AutoTokenizer.from_pretrained(model_name, trust_remote_code=True)
model = AutoModelForCausalLM.from_pretrained(
    model_name,
    torch_dtype=torch.bfloat16,
    device_map="auto",
    trust_remote_code=True
)

prompt = "Given the equation $2x + 5 = 11$, what is the value of $x$?"
messages = [
    {"role": "user", "content": prompt}
]
text = tokenizer.apply_chat_template(
    messages,
    tokenize=False,
    add_generation_prompt=True
)

model_inputs = tokenizer([text], return_tensors="pt").to(model.device)

generated_ids = model.generate(
    model_inputs.input_ids,
    max_new_tokens=512
)
generated_ids = [
    output_ids[len(input_ids):] for input_ids, output_ids in zip(model_inputs.input_ids, generated_ids)
]

response = tokenizer.batch_decode(generated_ids, skip_special_tokens=True)[0]
print(response)

🛠 Getting Started

Installation

venv environment:

git clone https://github.com/MiroMindAsia/MiroMind-M1.git
cd MiroMind-M1

# Install Python 3.10 environment.
python3.10 -m pip install virtualenv
virtualenv -p python3.10 venv
source venv/bin/activate

# Install dependencies.
pip3 install torch==2.4.0 --index-url https://download.pytorch.org/whl/cu124
pip3 install numpy psutil ninja packaging cmake
pip3 install flash_attn==2.7.4.post1 --no-build-isolation # This may take a while...
pip3 install -e .

🏋️ Training

Multi-Node Training

Here is a quik guided to start Ray for multi-node training.

On the head node

ray stop
ray start --head --node-ip-address $HEAD_NODE_IP --num-gpus 8 --dashboard-host=0.0.0.0

On other nodes

ray stop
ray start --address="$HEAD_NODE_IP:6379" --num-gpus 8

Start Training

First, please provde the below variables:

export MODEL_PATH=YOUR_MODEL_PATH
export CKPTS_DIR=YOUR_CKPTS_DIR
export TRAIN_FILE=YOUR_TRAIN_FILE
export TEST_FILE=YOUR_TEST_FILE
export HOME=YOUR_HOME_PATH

Then run the below script to start the training:

bash m1_train_script/campo_32b.sh

⚖️ Run Evaluation

We provide ready-to-use evaluation scripts in the m1_eval_script/ directory for mathematical reasoning benchmarks.

Quick Start

# Evaluate on AIME 2024
bash m1_eval_script/evaluate_7b_aime24.sh

# Evaluate on AIME 2025  
bash m1_eval_script/evaluate_7b_aime25.sh

# Evaluate on Math-500
bash m1_eval_script/evaluate_7b_math500.sh

Supported Benchmarks

Dataset	Script	Standard Runs
AIME 2024	`evaluate_7b_aime24.sh`	64 runs
AIME 2025	`evaluate_7b_aime25.sh`	64 runs
Math-500	`evaluate_7b_math500.sh`	5 runs

Results

Results are saved in results/[model_name]/[dataset_name]/ with:

average_accuracy.txt: Final accuracy score
run[X]_inference_eval_results.csv: Detailed results

🙏 Acknowledgement

The RL trianing is built from the wonderful verl project.

miromind-ai
/

MiroMind-M1-RL-32B

MiroMind-M1

🧾 Overview

📊 Evaluation

MiroMind-M1-SFT

MiroMind-M1-RL

🔗 Resources

Models

Data

🚀 Quickstart

🛠 Getting Started

Installation

🏋️ Training

Multi-Node Training

On the head node

On other nodes

Start Training

⚖️ Run Evaluation

Quick Start

Supported Benchmarks

Results

🙏 Acknowledgement

Model tree for miromind-ai/MiroMind-M1-RL-32B

Collection including miromind-ai/MiroMind-M1-RL-32B

MiroMind-M1