Used Open R1 (by Huggingface) to SFT my earlier thinker models. Encouraging results. Checkpoints also present.

https://github.com/ewre324/open-r1/tree/main

Based on DeepSeek R1 based method to train on specific reasoning dataset to ensure more thinking. Still the ... tags are not generated. TODO.

Downloads last month
11
Safetensors
Model size
135M params
Tensor type
F32
·
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for ewre324/ewre324-R1-SmolLM2-135M-Distill

Dataset used to train ewre324/ewre324-R1-SmolLM2-135M-Distill

Collection including ewre324/ewre324-R1-SmolLM2-135M-Distill