--- license: mit datasets: - Nickyang/ConciseR-Data language: - en metrics: - accuracy base_model: - Qwen/Qwen2.5-Math-7B pipeline_tag: text-generation ---

Walk Before You Run!
Concise LLM Reasoning via Reinforcement Learning

[![Paper](https://img.shields.io/badge/paper-5f16a8?style=for-the-badge&logo=arxiv&logoColor=white)](https://arxiv.org/abs/2505.21178)

## 🎉News - **[2025/05/27]** 🎉 We release [**ConciseR-Zero-7B**](https://huggingface.co/Nickyang/ConciseR-Zero-7B) and [**ConciseR-Zero-7B-Preview**](https://huggingface.co/Nickyang/ConciseR-Zero-7B-Preview). ## Usage ```python import vllm def apply_template(question: str): return ("""<|startoftext|>A conversation between User and Assistant. The User asks a question, and the Assistant solves it. \ The Assistant first thinks about the reasoning process in the mind and then provides the User with the answer. \ The reasoning process is enclosed within and answer is enclosed within tags, respectively, \ i.e., reasoning process here answer here . \ Please reason step by step, and put your final answer within \\boxed{}. User: {query} Assistant: """.replace("{query}", question)) model_name = "Nickyang/ConciseR-Zero-7B-Preview" sampling_params = vllm.SamplingParams( n=32, temperature=0.6, top_p=1.0, max_tokens=3072, ) model = vllm.LLM( model_name, max_model_len=4096, dtype="bfloat16", enable_prefix_caching=True, ) prompts = [ "How many positive whole-number divisors does 196 have?" ] prompts = list(map(apply_template, prompts)) outputs = model.generate(prompts, sampling_params) print(outputs) ``` ## Citation ```latex @misc{song2025conciser, title={Walk Before You Run! Concise LLM Reasoning via Reinforcement Learning}, author={Mingyang Song and Mao Zheng}, year={2025}, eprint={2505.21178}, archivePrefix={arXiv}, primaryClass={cs.CL}, url={https://arxiv.org/abs/2505.21178}, } ```