File size: 2,936 Bytes
e7524fe 5537a38 e7524fe 5537a38 e7524fe 5537a38 e7524fe 5537a38 e7524fe 5537a38 e7524fe 5537a38 e7524fe 5537a38 e7524fe 5537a38 e7524fe 5537a38 e7524fe 5537a38 e7524fe 5537a38 e7524fe 5537a38 e7524fe 5537a38 e7524fe 5537a38 e7524fe 5537a38 e7524fe 5537a38 e7524fe 5537a38 e7524fe 5537a38 e7524fe 5537a38 e7524fe 5537a38 e7524fe 5537a38 e7524fe 5537a38 e7524fe 5537a38 e7524fe 5537a38 e7524fe 5537a38 |
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 |
---
license: apache-2.0
language: en
library_name: transformers
pipeline_tag: text-generation
datasets:
- Open-Orca/OpenOrca
- Open-Orca/SlimOrca
base_model: HuggingFaceTB/SmolLM3-3B
tags:
- qlora
- smollm3
- fine-tuned
- rag
---
# 🧠 SmolLM3 QLoRA - OpenOrca Fine-Tuned
**SmolLM3 QLoRA** is a lightweight, 3B parameter open-source language model based on [SmolLM3-3B](https://huggingface.co/HuggingFaceTB/SmolLM3-3B), fine-tuned using [QLoRA](https://arxiv.org/abs/2305.14314) on the [OpenOrca Slim](https://huggingface.co/datasets/Open-Orca/SlimOrca) dataset (500K examples). It is optimized for **retrieval-augmented generation (RAG)** use cases and delivers **competitive benchmark scores** against much larger models like LLaMA-2 7B.
---
## ✨ Model Highlights
- 🔍 **Trained for real-world queries** using OpenOrca-style assistant data.
- ⚡ **Efficient:** 3B parameter model that runs on a single A100 or consumer GPU.
- 🧠 **Competent generalist:** Performs well on reasoning and knowledge tasks.
- 🔗 **RAG-friendly:** Ideal for hybrid search setups using BM25 + FAISS.
- 🧪 **Evaluated on benchmarks:** Outperforms similar-sized models.
---
## 🧰 Intended Use
SmolLM3 QLoRA is intended to serve as a fast and compact assistant model for:
- 💬 Lightweight RAG pipelines
- 📚 Document and web snippet reasoning
- 🤖 Prototype assistants
- 🧪 AI research in instruction tuning and hybrid retrieval
---
## 🧪 Evaluation
The model has been evaluated using [lm-evaluation-harness](https://github.com/EleutherAI/lm-evaluation-harness) on a 500-sample subset of key academic benchmarks.
| Task | Accuracy | Normalized Accuracy | LLaMA-2 7B |
|----------------|----------|---------------------|------------|
| **HellaSwag** | 51.2% | 66.4% | 56.7% / 73.2% |
| **ARC-Challenge** | 49.4% | 52.2% | 53.7% / 56.9% |
| **BoolQ** | 81.0% | — | 83.1% |
👉 Model achieves **~90–95% of LLaMA-2 7B** performance at less than **half the size**.
---
## 🏗️ Training Configuration
- **Base Model:** [`SmolLM3-3B`](https://huggingface.co/HuggingFaceTB/SmolLM3-3B)
- **Finetuning Method:** QLoRA (LoRA rank=8)
- **Dataset:** `Open-Orca/SlimOrca` (500K samples)
- **Precision:** bfloat16
- **Epochs:** 3
- **Max Length:** 1024 tokens
- **Hardware:** 2x A100 80GB
- **Framework:** 🤗 Transformers + TRL
---
## 🧠 How to Use
```python
from transformers import AutoTokenizer, AutoModelForCausalLM
model_id = "soupstick/smollm3-qlora-ft"
tokenizer = AutoTokenizer.from_pretrained(model_id)
model = AutoModelForCausalLM.from_pretrained(
model_id,
device_map="auto",
torch_dtype="auto"
)
inputs = tokenizer("Explain retrieval-augmented generation.", return_tensors="pt").to(model.device)
outputs = model.generate(**inputs, max_new_tokens=300)
print(tokenizer.decode(outputs[0], skip_special_tokens=True)) |