---
license: mit
datasets:
- mlabonne/FineTome-100k
- microsoft/orca-math-word-problems-200k
language:
- en
base_model:
- unsloth/Llama-3.2-3B-Instruct-unsloth-bnb-4bit
library_name: transformers
---
# 🔧 LoRA Adapter: Fine-tuned LLaMA 3.2B on FinetuneMe + Orca (4-bit, Unsloth)

This repository contains a **LoRA adapter** trained on a combined dataset of **FinetuneMe** and **Orca** using the [Unsloth LLaMA 3.2B 4-bit model](https://huggingface.co/unsloth/llama-3.2-3b-instruct-unsloth-bnb-4bit) as the base.

It is intended to be used with the base model `unsloth/llama-3.2-3b-instruct-unsloth-bnb-4bit`.

---

## 🧠 Model Architecture

- **Base Model**: [`unsloth/llama-3.2-3b-instruct-unsloth-bnb-4bit`](https://huggingface.co/unsloth/llama-3.2-3b-instruct-unsloth-bnb-4bit)
- **Adapter Type**: LoRA (Parameter-efficient fine-tuning)
- **Chat Format**: LLaMA 3.1-style using `get_chat_template(tokenizer, chat_template="llama-3.1")`

---

## 🏋️‍♂️ Training Configuration

- **Max Steps**: 600
- **Batch Size**: 2 per device
- **Gradient Accumulation**: 4 steps
- **Max Sequence Length**: 2048 tokens
- **Learning Rate**: 2e-4
- **Warmup Steps**: 5
- **Optimizer**: `paged_adamw_8bit`
- **Precision**: Mixed (fp16 or bf16 based on GPU support)

---

## 🔎 Inference Instructions

To use this adapter, you must first load the base model and then apply this LoRA adapter on top.

```python
from transformers import AutoTokenizer, AutoModelForCausalLM
from peft import PeftModel

# Load base model
base_model = "unsloth/llama-3.2-3b-instruct-unsloth-bnb-4bit"
model = AutoModelForCausalLM.from_pretrained(
    base_model,
    device_map="auto",
    trust_remote_code=True
)

# Load tokenizer
tokenizer = AutoTokenizer.from_pretrained(base_model)

# Load adapter (replace this with your actual repo link)
adapter_repo = "ajaypanigrahi/Llama-3.2-3B-instruct-finetunme-orca-lora-600steps"
model = PeftModel.from_pretrained(model, adapter_repo)

# Inference
prompt = "Write a Python function that returns the first 8 Fibonacci numbers."
inputs = tokenizer(prompt, return_tensors="pt").to("cuda")
outputs = model.generate(**inputs, max_new_tokens=100)
print(tokenizer.decode(outputs[0], skip_special_tokens=True))