MiMo: Unlocking the Reasoning Potential of Language Model -- From Pretraining to Posttraining
Abstract
We present MiMo-7B, a large language model born for reasoning tasks, with optimization across both pre-training and post-training stages. During pre-training, we enhance the data preprocessing pipeline and employ a three-stage data mixing strategy to strengthen the base model's reasoning potential. MiMo-7B-Base is pre-trained on 25 trillion tokens, with additional Multi-Token Prediction objective for enhanced performance and accelerated inference speed. During post-training, we curate a dataset of 130K verifiable mathematics and programming problems for reinforcement learning, integrating a test-difficulty-driven code-reward scheme to alleviate sparse-reward issues and employing strategic data resampling to stabilize training. Extensive evaluations show that MiMo-7B-Base possesses exceptional reasoning potential, outperforming even much larger 32B models. The final RL-tuned model, MiMo-7B-RL, achieves superior performance on mathematics, code and general reasoning tasks, surpassing the performance of OpenAI o1-mini. The model checkpoints are available at https://github.com/xiaomimimo/MiMo.
Community
We present MiMo, a large language model born for reasoning tasks, with optimization across both pre-training and post-training stages. Pre-trained on 25 trillion tokens, MiMo-7B-Base possesses exceptional reasoning potential. The final RL-tuned model, MiMo-7B-RL, achieves superior performance on mathematics, code and general reasoning tasks.
Thanks for your excellent work :) I believe I may have been mistakenly included in the list of authors, possibly due to a name similarity. To avoid any misunderstanding, would it be possible to kindly update the author list with the correct individual?
an audio overview for learning on the go: https://youtu.be/y6mSdLgJYQY
This is an automated message from the Librarian Bot. I found the following papers similar to this paper.
The following papers were recommended by the Semantic Scholar API
- How Difficulty-Aware Staged Reinforcement Learning Enhances LLMs' Reasoning Capabilities: A Preliminary Experimental Study (2025)
- 100 Days After DeepSeek-R1: A Survey on Replication Studies and More Directions for Reasoning Language Models (2025)
- Open-Reasoner-Zero: An Open Source Approach to Scaling Up Reinforcement Learning on the Base Model (2025)
- Llama-Nemotron: Efficient Reasoning Models (2025)
- SRPO: A Cross-Domain Implementation of Large-Scale Reinforcement Learning on LLM (2025)
- Exploring the Effect of Reinforcement Learning on Video Understanding: Insights from SEED-Bench-R1 (2025)
- Reinforcement Learning for Reasoning in Small LLMs: What Works and What Doesn't (2025)
Please give a thumbs up to this comment if you found it helpful!
If you want recommendations for any Paper on Hugging Face checkout this Space
You can directly ask Librarian Bot for paper recommendations by tagging it in a comment:
@librarian-bot
recommend
Models citing this paper 4
Datasets citing this paper 0
No dataset linking this paper