--- license: apache-2.0 datasets: - GAIR/LIMO base_model: - Qwen/Qwen3-14B tags: - limo thumbnail: https://huggingface.co/mrm8488/Qwen3-14B-ft-limo/resolve/main/logo-min.png --- # Qwen3-14B-ft-limo
limón logo
### Overview This model is a fine-tuned version of Qwen3-14B using the [limo](https://github.com/GAIR-NLP/LIMO) training recipe (and dataset). We use `Qwen3-14B-Instruct` instead of `Qwen2.5-32B-Instruct` as base model. --- ### Training Configuration ### Training Args - **Dataset**: `GAIR/LIMO` - **Per Device Train Batch Size**: 2 - **Per Device Eval Batch Size**: 2 - **Gradient Accumulation Steps**: 4 *Effective batch size is increased via gradient accumulation.* - **Warmup Steps**: 5 - **Total Training Epochs**: 8 (15 in the original experiment) - **Learning Rate**: 2e-4 *This was selected for moderate-duration fine-tuning. Consider 2e-5 for longer runs.* - **Evaluation Strategy**: `steps` - **Evaluation Steps**: 50 - **Logging Steps**: 5 - **Optimizer**: `adamw_8bit` - **Weight Decay**: 0.01 - **LR Scheduler Type**: `linear` - **Seed**: 3407 ### QLoRA Config - **rank**: 128 - **alpha**: 256 - **dropout**: 0 - **modules**: `"q_proj", "k_proj", "v_proj", "o_proj", "gate_proj", "up_proj", "down_proj"` --- ### Notes - The use of `adamw_8bit` optimizer allows for memory-efficient training. - Small per-device batch sizes are compensated by gradient accumulation to simulate a larger effective batch. - Evaluation and saving every 50 steps enable close tracking of training progress and early stopping capabilities if needed. --- ### Evaluations | Model | AIME24 | MATH500 | Training Samples | | ------------- | --------- | --------- | ---------------- | | Ours |**80.0%** | WIP | 817 | LIMO | 57.1% | 94.8% | 817 | | Previous SOTA | 6.5% | 59.2% | 100k+ | ### Intended Use The model is intended for research and experimentation in instruction-following tasks. Its performance should be validated on downstream tasks before deployment in production. --- ### Example Usage (unsloth) ```python from unsloth import FastLanguageModel model, tokenizer = FastLanguageModel.from_pretrained( model_name = "mrm8488/Qwen3-14B-ft-limo", max_seq_length = 32768, fast_inference=True, # install vllm if wanna use `fast_inference` #load_in_4bit = False, #load_in_8bit = False, ) messages = [ {"role" : "user", "content" : "Solve (x + 2)^2 = 0."} ] text = tokenizer.apply_chat_template( messages, tokenize = False, add_generation_prompt = True, enable_thinking = False, # Disable thinking as it is trained for thinking by default ) from transformers import TextStreamer _ = model.generate( **tokenizer(text, return_tensors = "pt").to("cuda"), max_new_tokens = 8192, # Increase for longer outputs! temperature = 0.7, top_p = 0.8, top_k = 20, # For non thinking as it is trained for thinking by default (LIMO) streamer = TextStreamer(tokenizer, skip_prompt = True), ) ``` ### Example Usage (Hugging Face) ```python from transformers import AutoModelForCausalLM, AutoTokenizer model_name = "mrm8488/Qwen3-14B-ft-limo" # load the tokenizer and the model tokenizer = AutoTokenizer.from_pretrained(model_name) model = AutoModelForCausalLM.from_pretrained( model_name, torch_dtype="auto", device_map="auto" ) # prepare the model input prompt = "Solve (x + 2)^2 = 0." messages = [ {"role": "user", "content": prompt} ] text = tokenizer.apply_chat_template( messages, tokenize=False, add_generation_prompt=True, enable_thinking=False # Disable thinking as it is trained for thinking by default ) model_inputs = tokenizer([text], return_tensors="pt").to(model.device) # conduct text completion generated_ids = model.generate( **model_inputs, max_new_tokens=32768 ) output_ids = generated_ids[0][len(model_inputs.input_ids[0]):].tolist() print(tokenizer.decode(output_ids, skip_special_tokens=True)) ``` --- ### Limitations TBD --- ### Acknowledgements I extend my gratitude to the GAIR team for providing the GAIR/LIMO dataset and training receipe.