soupstick
/

smollm3-qlora-ft

Text Generation

Model card Files Files and versions

smollm3-qlora-ft / README.md

soupstick's picture

Update README.md

5537a38 verified about 1 month ago

|

history blame contribute delete

2.94 kB

	---
	license: apache-2.0
	language: en
	library_name: transformers
	pipeline_tag: text-generation
	datasets:
	- Open-Orca/OpenOrca
	- Open-Orca/SlimOrca
	base_model: HuggingFaceTB/SmolLM3-3B
	tags:
	- qlora
	- smollm3
	- fine-tuned
	- rag
	---

	# 🧠 SmolLM3 QLoRA - OpenOrca Fine-Tuned

	SmolLM3 QLoRA is a lightweight, 3B parameter open-source language model based on [SmolLM3-3B](https://huggingface.co/HuggingFaceTB/SmolLM3-3B), fine-tuned using [QLoRA](https://arxiv.org/abs/2305.14314) on the [OpenOrca Slim](https://huggingface.co/datasets/Open-Orca/SlimOrca) dataset (500K examples). It is optimized for retrieval-augmented generation (RAG) use cases and delivers competitive benchmark scores against much larger models like LLaMA-2 7B.

	---

	## ✨ Model Highlights

	- 🔍 Trained for real-world queries using OpenOrca-style assistant data.
	- ⚡ Efficient: 3B parameter model that runs on a single A100 or consumer GPU.
	- 🧠 Competent generalist: Performs well on reasoning and knowledge tasks.
	- 🔗 RAG-friendly: Ideal for hybrid search setups using BM25 + FAISS.
	- 🧪 Evaluated on benchmarks: Outperforms similar-sized models.

	---

	## 🧰 Intended Use

	SmolLM3 QLoRA is intended to serve as a fast and compact assistant model for:

	- 💬 Lightweight RAG pipelines
	- 📚 Document and web snippet reasoning
	- 🤖 Prototype assistants
	- 🧪 AI research in instruction tuning and hybrid retrieval

	---

	## 🧪 Evaluation

	The model has been evaluated using [lm-evaluation-harness](https://github.com/EleutherAI/lm-evaluation-harness) on a 500-sample subset of key academic benchmarks.

	\| Task \| Accuracy \| Normalized Accuracy \| LLaMA-2 7B \|
	\|----------------\|----------\|---------------------\|------------\|
	\| HellaSwag \| 51.2% \| 66.4% \| 56.7% / 73.2% \|
	\| ARC-Challenge \| 49.4% \| 52.2% \| 53.7% / 56.9% \|
	\| BoolQ \| 81.0% \| — \| 83.1% \|

	👉 Model achieves ~90–95% of LLaMA-2 7B performance at less than half the size.

	---

	## 🏗️ Training Configuration

	- Base Model: [`SmolLM3-3B`](https://huggingface.co/HuggingFaceTB/SmolLM3-3B)
	- Finetuning Method: QLoRA (LoRA rank=8)
	- Dataset: `Open-Orca/SlimOrca` (500K samples)
	- Precision: bfloat16
	- Epochs: 3
	- Max Length: 1024 tokens
	- Hardware: 2x A100 80GB
	- Framework: 🤗 Transformers + TRL

	---

	## 🧠 How to Use

	```python
	from transformers import AutoTokenizer, AutoModelForCausalLM

	model_id = "soupstick/smollm3-qlora-ft"

	tokenizer = AutoTokenizer.from_pretrained(model_id)
	model = AutoModelForCausalLM.from_pretrained(
	model_id,
	device_map="auto",
	torch_dtype="auto"
	)

	inputs = tokenizer("Explain retrieval-augmented generation.", return_tensors="pt").to(model.device)
	outputs = model.generate(**inputs, max_new_tokens=300)
	print(tokenizer.decode(outputs[0], skip_special_tokens=True))