--- license: apache-2.0 datasets: - open-thoughts/OpenThoughts2-1M - Vinnnf/Hybrid-OpenThoughts2-1M-1.5B base_model: - deepseek-ai/DeepSeek-R1-Distill-Qwen-1.5B pipeline_tag: text-generation library_name: transformers --- # Thinkless: LLM Learns When to Think ![image/png](https://cdn-uploads.huggingface.co/production/uploads/646a1939c37ca1e12308fe81/SRxJKkSuC0y-oMB7SFeR6.png)
📄 Paper Link ArXiv
💻 RL Code VainF/Thinkless
💻 SFT Code VainF/Reasoning-SFT
🤖 RL Model Thinkless-1.5B-RL-DeepScaleR
🐣 Warmup Model Thinkless-1.5B-Warmup
📊 Data for Warmup Hybrid-OpenThoughts2-1M-1.5B
📊 Data for RL agentica-org/DeepScaleR-Preview-Dataset
## Introduction > [!NOTE] > ***Can LLMs learn when to think?*** We propose Thinkless, a learnable framework that empowers an LLM to adaptively select between short-form and long-form reasoning based on both task complexity and the model's ability. Thinkless is trained under a reinforcement learning paradigm and employs two control tokens, \ for concise responses and \ for detailed reasoning. At the core of our method is a Decoupled Group Relative Policy Optimization (DeGRPO) algorithm, which decomposes the learning objective of hybrid reasoning into two components: (1) a control token loss that governs the selection of the reasoning mode, and (2) a response loss that improves the accuracy of the generated answers. This decoupled formulation enables fine-grained control over the contributions of each objective, stabilizing training and effectively preventing collapse observed in vanilla GRPO. Empirically, on several benchmarks such as Minerva Algebra, MATH-500, and GSM8K, Thinkless is able to reduce the usage of long-chain thinking by 50\% - 90\%, significantly reducing the computational cost of Reasoning Language Models. ## Pipeline ![image/png](https://cdn-uploads.huggingface.co/production/uploads/646a1939c37ca1e12308fe81/3mx8EJUyOvCtxPnYTcwbS.png) ## QuickStart ```python from transformers import AutoModelForCausalLM, AutoTokenizer model_name = "Vinnnf/Thinkless-1.5B-Warmup" model = AutoModelForCausalLM.from_pretrained( model_name, torch_dtype="auto", device_map="auto" ) tokenizer = AutoTokenizer.from_pretrained(model_name) instruction = "Please reason step by step, and put your final answer within \\boxed{}." prompt = f"{instruction}\nThe arithmetic mean of 7, 2, $x$ and 10 is 9. What is the value of $x$?" messages = [ {"role": "user", "content": prompt} ] text = tokenizer.apply_chat_template( messages, tokenize=False, add_generation_prompt=True ) think_mode = True if think_mode: text = f"{text}" else: text = f"{text}" model_inputs = tokenizer([text], return_tensors="pt").to(model.device) generated_ids = model.generate( **model_inputs, max_new_tokens=4096 ) generated_ids = [ output_ids[len(input_ids):] for input_ids, output_ids in zip(model_inputs.input_ids, generated_ids) ] num_tokens = len(generated_ids[0]) response = tokenizer.batch_decode(generated_ids, skip_special_tokens=True)[0] print(text+response) print(f"\nThink Mode: {think_mode}") print(f"Number of tokens: {num_tokens}") ``` ## Citation If you find this work helpful, please cite: ``` @article{fang2025thinkless, title={Thinkless: LLM Learns When to Think}, author={Fang, Gongfan and Ma, Xinyin and Wang, Xinchao}, journal={arXiv preprint arXiv:2505.13379}, year={2025} } ```