AutoL2S-7B

This is the official model repository for AutoL2S-7B, a model fine-tuned for efficient reasoning based on Qwen/Qwen2.5-7B-Instruct.

πŸ’‘ Overview

AutoL2S enables automatically switching between short and long reasoning paths based on input complexity. Auto Long-Short Reasoning (AutoL2S), a dynamic and model-agnostic framework that enables LLMs to dynamically compress their generated reasoning path based on the complexity of the reasoning question. AutoL2S enables a learned paradigm, in which LLMs themselves can decide when longer reasoning is necessary and when shorter reasoning suffices, by training on data annotated with our proposed method, which includes both long and short CoT paths and a special <EASY> token (<specialLong> in the implementation). We then use token to indicate when the model can skip generating lengthy CoT reasoning. This proposed annotation strategy can enhance the LLMs’ ability to generate shorter CoT reasoning paths with improved quality after training.

This repository contains:

  • Model weights
  • Configuration files
  • necessary scripts in the examples/ directory


🧩 Dependencies

We recommend using the model with vLLM.
The code has been tested with:

vLLM == 0.6.2

πŸš€ How to Use

Run the inference example:

cd examples
python run_inference.py

Alternatively, please download examples/prefixLLM.py and examples/template.py from this repository and put them in your working dir.

from vllm import SamplingParams
from prefixLLM import PrefixLLM
from template import SYSTEM_PROMPT, SHORT_TRIGGER

llm = PrefixLLM(model="amandaa/AutoL2S-7b")
max_tokens, temp = 32768, 0.7
sampling_params_route = SamplingParams(max_tokens=max_tokens, temperature=temp, stop=["<specialLong>"], include_stop_str_in_output=True)
sampling_params_force_think = SamplingParams(max_tokens=max_tokens, temperature=temp)

question = "Convert the point $(0,3)$ in rectangular coordinates to polar coordinates.  Enter your answer in the form $(r,\\theta),$ where $r > 0$ and $0 \\le \\theta < 2 \\pi.$"
messages = [
    {"role": "system", "content": SYSTEM_PROMPT},
    {"role": "user", "content": question}
]
responses = llm.route_chat(messages=messages, sampling_params_route=sampling_params_route, sampling_params_force_think=sampling_params_force_think, use_tqdm=True, trigger_word=SHORT_TRIGGER)

print(SHORT_TRIGGER + responses[0].outputs[0].text)

πŸ” Citation

If you use this model in your work, please consider citing:

@article{luo2025autol2s,
  title={AutoL2S: Auto Long-Short Reasoning for Efficient Large Language Models},
  author={Luo, Feng and Chuang, Yu-Neng and Wang, Guanchu and Le, Hoang Anh Duy and Zhong, Shaochen and Liu, Hongyi and Yuan, Jiayi and Sui, Yang and Braverman, Vladimir and Chaudhary, Vipin and others},
  journal={arXiv preprint arXiv:2505.22662},
  year={2025}
}
Downloads last month
47
Safetensors
Model size
7.62B params
Tensor type
BF16
Β·
Inference Providers NEW
This model isn't deployed by any Inference Provider. πŸ™‹ Ask for provider support

Model tree for amandaa/AutoL2S-7b

Base model

Qwen/Qwen2.5-7B
Finetuned
(2346)
this model
Quantizations
2 models