--- license: mit datasets: - agentica-org/DeepScaleR-Preview-Dataset base_model: - deepseek-ai/DeepSeek-R1-Distill-Qwen-1.5B tags: - LRM - hybrid_reasoning - efficient_reasoning --- # AdaptThink: LLM Can Learn When to Think

🤗 HF Collections • 💻 Github Repo • 📃 Paper

## 🔍 Table of Contents - [🤖️ AdaptThink](#adapt_think) - [⚙️ Released Models](#model) - [📊 Evaluation](#evaluation) - [📝 Citation](#citation) ## 🤖️ AdaptThink We present **AdapThink**, a novel reinforcement learning (RL) algorithm that enables reasoning models to adaptively choose between **Thinking** and **NoThinking** modes according to the difficulty of each input problem, thereby achieving automatic hybrid reasoning. Specifically, the model engages in thinking only when the problem is determined to be challenging; for other simple question, it will bypass the thinking process and directly produce a concise final solution. This approach substantially reduces inference costs while further improving overall performance. ![image/png](https://cdn-uploads.huggingface.co/production/uploads/66cdd285c51a915bd5f2d017/JaeJiBwLkcwAuexRAkLX5.png) ## ⚙️ Released Models ### All Available Datasets and Models We apply the AdaptThink algorithm on DeepSeek-R1-Distill-Qwen-1.5B with $\delta$ from 0 to 0.1, and DeepSeek-R1-Distill-Qwen-7B with $\delta=0.05$. A larger $\large$ results in a higher proportion of NoThinking responses, which reduces more inference costs but also diminish the resultant improvement in accuracy. All the trained models are available on HuggingFace. | Name | HF Repo | |---|---| | AdaptThink-1.5B-delta0 | [🤗 HF Repo](https://huggingface.co/THU-KEG/AdaptThink-1.5B-delta0) | | AdaptThink-1.5B-delta0.01 | [🤗 HF Repo](https://huggingface.co/THU-KEG/AdaptThink-1.5B-delta0.01) | | AdaptThink-1.5B-delta0.02 | [🤗 HF Repo](https://huggingface.co/THU-KEG/AdaptThink-1.5B-delta0.02) | | AdaptThink-1.5B-delta0.05 | [🤗 HF Repo](https://huggingface.co/THU-KEG/AdaptThink-1.5B-delta0.05) | | AdaptThink-1.5B-delta0.075 | [🤗 HF Repo](https://huggingface.co/THU-KEG/AdaptThink-1.5B-delta0.075) | | AdaptThink-1.5B-delta0.1 | [🤗 HF Repo](https://huggingface.co/THU-KEG/AdaptThink-1.5B-delta0.1) | | AdaptThink-7B-delta0.05 | [🤗 HF Repo](https://huggingface.co/THU-KEG/AdaptThink-7B-delta0.05) | ## 📊 Evaluation Results We list our evaluation results as follows: ##### 1. Comparison with existing methods for efficient reasoning on mathematics datasets ![image/png](https://cdn-uploads.huggingface.co/production/uploads/66cdd285c51a915bd5f2d017/ZLV8ZfEet1dp-4jyzBxiG.png) ##### 2. Nothinking responses ratio and accuracy across different difficulty levels on MATH500 ![image/png](https://cdn-uploads.huggingface.co/production/uploads/66cdd285c51a915bd5f2d017/GUNfW9qO2aaT9_lo1XXPf.png) ##### 3. Comparison of different $\delta$ values ![image/png](https://cdn-uploads.huggingface.co/production/uploads/66cdd285c51a915bd5f2d017/RXrXwxVSAYlR3-_t0GUwV.png) ##### 4. Evaluation results on MMLU image ## 📝 Citation If you find our work useful, please consider citing LongReward: ``` @article{zhang2025adapt_think, title = {AdaptThink: LLM Can Learn When to Think} author={Jiajie Zhang and Nianyi Lin and Lei Hou and Ling Feng and Juanzi Li}, journal={arXiv preprint arXiv: 2505.13417}, url={https://arxiv.org/abs/2505.13417} year={2025} } ```