--- license: mit datasets: - agentica-org/DeepScaleR-Preview-Dataset base_model: - deepseek-ai/DeepSeek-R1-Distill-Qwen-1.5B tags: - LRM - hybrid_reasoning - efficient_reasoning --- # AdaptThink: LLM Can Learn When to Think
🤗 HF Collections • 💻 Github Repo • 📃 Paper
## 🔍 Table of Contents - [🤖️ AdaptThink](#adapt_think) - [⚙️ Released Models](#model) - [📊 Evaluation](#evaluation) - [📝 Citation](#citation) ## 🤖️ AdaptThink We present **AdapThink**, a novel reinforcement learning (RL) algorithm that enables reasoning models to adaptively choose between **Thinking** and **NoThinking** modes according to the difficulty of each input problem, thereby achieving automatic hybrid reasoning. Specifically, the model engages in thinking only when the problem is determined to be challenging; for other simple question, it will bypass the thinking process and directly produce a concise final solution. This approach substantially reduces inference costs while further improving overall performance.  ## ⚙️ Released Models ### All Available Datasets and Models We apply the AdaptThink algorithm on DeepSeek-R1-Distill-Qwen-1.5B with $\delta$ from 0 to 0.1, and DeepSeek-R1-Distill-Qwen-7B with $\delta=0.05$. A larger $\large$ results in a higher proportion of NoThinking responses, which reduces more inference costs but also diminish the resultant improvement in accuracy. All the trained models are available on HuggingFace. | Name | HF Repo | |---|---| | AdaptThink-1.5B-delta0 | [🤗 HF Repo](https://huggingface.co/THU-KEG/AdaptThink-1.5B-delta0) | | AdaptThink-1.5B-delta0.01 | [🤗 HF Repo](https://huggingface.co/THU-KEG/AdaptThink-1.5B-delta0.01) | | AdaptThink-1.5B-delta0.02 | [🤗 HF Repo](https://huggingface.co/THU-KEG/AdaptThink-1.5B-delta0.02) | | AdaptThink-1.5B-delta0.05 | [🤗 HF Repo](https://huggingface.co/THU-KEG/AdaptThink-1.5B-delta0.05) | | AdaptThink-1.5B-delta0.075 | [🤗 HF Repo](https://huggingface.co/THU-KEG/AdaptThink-1.5B-delta0.075) | | AdaptThink-1.5B-delta0.1 | [🤗 HF Repo](https://huggingface.co/THU-KEG/AdaptThink-1.5B-delta0.1) | | AdaptThink-7B-delta0.05 | [🤗 HF Repo](https://huggingface.co/THU-KEG/AdaptThink-7B-delta0.05) | ## 📊 Evaluation Results We list our evaluation results as follows: ##### 1. Comparison with existing methods for efficient reasoning on mathematics datasets  ##### 2. Nothinking responses ratio and accuracy across different difficulty levels on MATH500  ##### 3. Comparison of different $\delta$ values  ##### 4. Evaluation results on MMLU