File size: 5,935 Bytes
0ccab1b 83efcf3 0ccab1b 83efcf3 0ccab1b 83efcf3 0ccab1b 83efcf3 0ccab1b 83efcf3 0ccab1b 83efcf3 0ccab1b 83efcf3 0ccab1b 83efcf3 0ccab1b 83efcf3 0ccab1b 83efcf3 0ccab1b 83efcf3 0ccab1b 83efcf3 0ccab1b 83efcf3 0ccab1b 83efcf3 0ccab1b 83efcf3 0ccab1b 83efcf3 0ccab1b 83efcf3 0ccab1b 83efcf3 0ccab1b |
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 |
---
license: apache-2.0
datasets:
- HuggingFaceH4/Bespoke-Stratos-17k
language:
- en
- de
- fr
- it
- pt
- hi
- es
- th
base_model:
- meta-llama/Llama-3.2-1B-Instruct
---
# Model Card for Model ID
# π Introducing Llama-3.2-1B-Instruct-Open-R1-Distill
Built on **Llama-3.2-1B-Instruct** and Hugging Faceβs [OpenR1](https://github.com/huggingface/open-r1) β a fully open reproduction of **DeepSeek-R1** β this model brings powerful reasoning capabilities to compact, efficient architectures.
## π Why This Matters
I have always been passionate about pushing the boundaries of **LLM** technology in smaller models that can run seamlessly on laptop CPUs and smartphones.
With the recent breakthrough of **DeepSeek-R1**, developing a high-quality reasoning model through distillation has become remarkably straightforward. It requires only **supervised fine-tuning (SFT)** on a dataset generated by a teacher model.
Thanks to **Hugging Face**, we now have a streamlined framework to make this process more accessible than ever.
### Model Description
<!-- Provide a longer summary of what this model is. -->
- **Developed by:** keeeeenw
- **Funded by [optional]:** myself for < $500
- **Model type:** Llama-3.2-1B-Instruct with reasoning capability
- **License:** Apache License 2.0
- **Finetuned from model [optional]:** Llama-3.2-1B-Instruct
## π― Uses
- π‘ **On-device AI assistants** for reasoning and general-purpose tasks
- π± **Mobile and edge AI applications** requiring lightweight models
- π€ **Chatbots and virtual assistants** optimized for efficiency
- π **Fine-tuning for specific domains** with SFT training
### How to run the code?
```{python}
model = LlamaForCausalLM.from_pretrained("keeeeenw/Llama-3.2-1B-Instruct-Open-R1-Distill")
# Prompt supported by HuggingFaceH4/Bespoke-Stratos-17k
messages = [
{
"role": "system",
"content": "Your role as an assistant involves thoroughly exploring questions through a systematic long thinking process before providing the final precise and accurate solutions. This requires engaging in a comprehensive cycle of analysis, summarizing, exploration, reassessment, reflection, backtracing, and iteration to develop well-considered thinking process. Please structure your response into two main sections: Thought and Solution. In the Thought section, detail your reasoning process using the specified format: <|begin_of_thought|> {thought with steps separated with '\n\n'} <|end_of_thought|> Each step should include detailed considerations such as analisying questions, summarizing relevant findings, brainstorming new ideas, verifying the accuracy of the current steps, refining any errors, and revisiting previous steps. In the Solution section, based on various attempts, explorations, and reflections from the Thought section, systematically present the final solution that you deem correct. The solution should remain a logical, accurate, concise expression style and detail necessary step needed to reach the conclusion, formatted as follows: <|begin_of_solution|> {final formatted, precise, and clear solution} <|end_of_solution|> Now, try to solve the following question through the above guidelines:",
},
# question from https://www.reddit.com/r/LocalLLaMA/comments/13zz8y5/what_questions_do_you_ask_llms_to_check_their/
{"role": "user", "content": "Please provide me instructions on how to steal an egg from my chicken?"},
]
formatted_chat = tokenizer.apply_chat_template(messages, tokenize=False, add_generation_prompt=True, return_tensors="pt")
print(formatted_chat)
inputs = tokenizer(formatted_chat, return_tensors="pt", padding=True)
attention_mask = inputs["attention_mask"]
streamer = TextStreamer(tokenizer, skip_prompt=True)
outputs = model.generate(inputs['input_ids'],
streamer=streamer,
attention_mask=attention_mask,
pad_token_id=tokenizer.eos_token_id,
max_new_tokens=2048)
print(tokenizer.decode(outputs[0]))
```
## ποΈββοΈ Training Details
To reprdouce the results, simply go to HuggingFace's [OpenR1](https://github.com/huggingface/open-r1) and install the package.
And then execute the following command:
```
ACCELERATE_LOG_LEVEL=info accelerate launch --config_file recipes/accelerate_configs/zero3.yaml src/open_r1/sft.py --config recipes/config_llama3_instrcut_1b.yaml
```
You can create your own ```recipes/config_llama3_instrcut_1b.yaml``` by copying [config_full.yaml](https://github.com/huggingface/open-r1/blob/main/recipes/qwen/Qwen2.5-1.5B-Instruct/sft/config_full.yaml)
to the desired folder and change model path to ```model_name_or_path: meta-llama/Llama-3.2-1B-Instruct``` or any HuggingFace model repo id you are interested in.
You may also choose to training for more than 1 epoch (I trained for 5 epoch).
Also, if you want to get intermediate checkpoints, set the save parameters accordingly:
```
save_strategy: "steps"
save_steps: 100
```
I have tried to use 1 for both train and eval batch size on 1 Nvidia 4090 but still got OOM so I rented 4 x LS40s from [vast.ai]. Training 5 epoch only required < 4 hours.
```
per_device_eval_batch_size: 4
per_device_train_batch_size: 4
```
## π Evaluation
The evaluation of this model is based on HuggingFace's instructions [OpenR1](https://github.com/huggingface/open-r1)
```
NUM_GPUS=4
MODEL="/root/open-r1/data/meta-llama/Llama-3.2-1B-Instruct"
MODEL_ARGS="pretrained=$MODEL,dtype=float16,data_parallel_size=$NUM_GPUS,max_model_length=32768,gpu_memory_utilisation=0.8"
TASK=aime24
OUTPUT_DIR=data/evals/$MODEL
lighteval vllm $MODEL_ARGS "custom|$TASK|0|0" \
--custom-tasks src/open_r1/evaluate.py \
--use-chat-template \
--system-prompt="Please reason step by step, and put your final answer within \boxed{}." \
--output-dir $OUTPUT_DIR
```
Results: To be added
|