SmolLM2-135M-Instruct-Plus

This model is a finetuned version of HuggingFaceTB/SmolLM2-135M-Instruct, aiming to maximize knowledge in a small 135M parameter model.

⚠️ Consider this model a creative text generator. Without additional finetuning, it gives wildly inaccurate answers. Don't trust the output of this model without additional verification.

Model Details

Intended Uses

For research, experimentation, and educational purposes where a small instruction-following model is desired.

Limitations

  • Hallucinations: Prone to generating incorrect information due to its small size.
  • Repetitive Output: May produce repetitive text.

Training Details

Both SFT and DPO share common settings: liger_kernel booster, LoRA fine-tuning, custom model, BF16 compute type, batch size of 2, and a cosine scheduler with a learning rate of 5e-5. RSLoRA is enabled with a rank of 16 and alpha of 32.

The main differences are in the dataset and training specifics. SFT uses CrashCourse_120K with packing enabled and LoRA dropout of 0, while DPO uses orca_pairs with packing disabled and a LoRA dropout of 0.95.

Evaluation

Provides coherent and creative answers but may often be incorrect. Thorough evaluation is recommended before deployment.

Downloads last month
58
Safetensors
Model size
135M params
Tensor type
BF16
·
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for agentlans/SmolLM2-135M-Instruct-Plus

Finetuned
(163)
this model

Datasets used to train agentlans/SmolLM2-135M-Instruct-Plus