--- license: apache-2.0 language: en library_name: transformers pipeline_tag: text-generation datasets: - Open-Orca/OpenOrca - Open-Orca/SlimOrca base_model: HuggingFaceTB/SmolLM3-3B tags: - qlora - smollm3 - fine-tuned - rag --- # ๐Ÿง  SmolLM3 QLoRA - OpenOrca Fine-Tuned **SmolLM3 QLoRA** is a lightweight, 3B parameter open-source language model based on [SmolLM3-3B](https://huggingface.co/HuggingFaceTB/SmolLM3-3B), fine-tuned using [QLoRA](https://arxiv.org/abs/2305.14314) on the [OpenOrca Slim](https://huggingface.co/datasets/Open-Orca/SlimOrca) dataset (500K examples). It is optimized for **retrieval-augmented generation (RAG)** use cases and delivers **competitive benchmark scores** against much larger models like LLaMA-2 7B. --- ## โœจ Model Highlights - ๐Ÿ” **Trained for real-world queries** using OpenOrca-style assistant data. - โšก **Efficient:** 3B parameter model that runs on a single A100 or consumer GPU. - ๐Ÿง  **Competent generalist:** Performs well on reasoning and knowledge tasks. - ๐Ÿ”— **RAG-friendly:** Ideal for hybrid search setups using BM25 + FAISS. - ๐Ÿงช **Evaluated on benchmarks:** Outperforms similar-sized models. --- ## ๐Ÿงฐ Intended Use SmolLM3 QLoRA is intended to serve as a fast and compact assistant model for: - ๐Ÿ’ฌ Lightweight RAG pipelines - ๐Ÿ“š Document and web snippet reasoning - ๐Ÿค– Prototype assistants - ๐Ÿงช AI research in instruction tuning and hybrid retrieval --- ## ๐Ÿงช Evaluation The model has been evaluated using [lm-evaluation-harness](https://github.com/EleutherAI/lm-evaluation-harness) on a 500-sample subset of key academic benchmarks. | Task | Accuracy | Normalized Accuracy | LLaMA-2 7B | |----------------|----------|---------------------|------------| | **HellaSwag** | 51.2% | 66.4% | 56.7% / 73.2% | | **ARC-Challenge** | 49.4% | 52.2% | 53.7% / 56.9% | | **BoolQ** | 81.0% | โ€” | 83.1% | ๐Ÿ‘‰ Model achieves **~90โ€“95% of LLaMA-2 7B** performance at less than **half the size**. --- ## ๐Ÿ—๏ธ Training Configuration - **Base Model:** [`SmolLM3-3B`](https://huggingface.co/HuggingFaceTB/SmolLM3-3B) - **Finetuning Method:** QLoRA (LoRA rank=8) - **Dataset:** `Open-Orca/SlimOrca` (500K samples) - **Precision:** bfloat16 - **Epochs:** 3 - **Max Length:** 1024 tokens - **Hardware:** 2x A100 80GB - **Framework:** ๐Ÿค— Transformers + TRL --- ## ๐Ÿง  How to Use ```python from transformers import AutoTokenizer, AutoModelForCausalLM model_id = "soupstick/smollm3-qlora-ft" tokenizer = AutoTokenizer.from_pretrained(model_id) model = AutoModelForCausalLM.from_pretrained( model_id, device_map="auto", torch_dtype="auto" ) inputs = tokenizer("Explain retrieval-augmented generation.", return_tensors="pt").to(model.device) outputs = model.generate(**inputs, max_new_tokens=300) print(tokenizer.decode(outputs[0], skip_special_tokens=True))