--- license: mit datasets: - mlabonne/FineTome-100k - microsoft/orca-math-word-problems-200k language: - en base_model: - unsloth/Llama-3.2-3B-Instruct-unsloth-bnb-4bit library_name: transformers --- # 🔧 LoRA Adapter: Fine-tuned LLaMA 3.2B on FinetuneMe + Orca (4-bit, Unsloth) This repository contains a **LoRA adapter** trained on a combined dataset of **FinetuneMe** and **Orca** using the [Unsloth LLaMA 3.2B 4-bit model](https://huggingface.co/unsloth/llama-3.2-3b-instruct-unsloth-bnb-4bit) as the base. It is intended to be used with the base model `unsloth/llama-3.2-3b-instruct-unsloth-bnb-4bit`. --- ## 🧠 Model Architecture - **Base Model**: [`unsloth/llama-3.2-3b-instruct-unsloth-bnb-4bit`](https://huggingface.co/unsloth/llama-3.2-3b-instruct-unsloth-bnb-4bit) - **Adapter Type**: LoRA (Parameter-efficient fine-tuning) - **Chat Format**: LLaMA 3.1-style using `get_chat_template(tokenizer, chat_template="llama-3.1")` --- ## 🏋️‍♂️ Training Configuration - **Max Steps**: 600 - **Batch Size**: 2 per device - **Gradient Accumulation**: 4 steps - **Max Sequence Length**: 2048 tokens - **Learning Rate**: 2e-4 - **Warmup Steps**: 5 - **Optimizer**: `paged_adamw_8bit` - **Precision**: Mixed (fp16 or bf16 based on GPU support) --- ## 🔎 Inference Instructions To use this adapter, you must first load the base model and then apply this LoRA adapter on top. ```python from transformers import AutoTokenizer, AutoModelForCausalLM from peft import PeftModel # Load base model base_model = "unsloth/llama-3.2-3b-instruct-unsloth-bnb-4bit" model = AutoModelForCausalLM.from_pretrained( base_model, device_map="auto", trust_remote_code=True ) # Load tokenizer tokenizer = AutoTokenizer.from_pretrained(base_model) # Load adapter (replace this with your actual repo link) adapter_repo = "ajaypanigrahi/Llama-3.2-3B-instruct-finetunme-orca-lora-600steps" model = PeftModel.from_pretrained(model, adapter_repo) # Inference prompt = "Write a Python function that returns the first 8 Fibonacci numbers." inputs = tokenizer(prompt, return_tensors="pt").to("cuda") outputs = model.generate(**inputs, max_new_tokens=100) print(tokenizer.decode(outputs[0], skip_special_tokens=True))