metadata

license: mit
datasets:
  - mlabonne/FineTome-100k
  - microsoft/orca-math-word-problems-200k
language:
  - en
base_model:
  - unsloth/Llama-3.2-3B-Instruct-unsloth-bnb-4bit
library_name: transformers

🔧 LoRA Adapter: Fine-tuned LLaMA 3.2B on FinetuneMe + Orca (4-bit, Unsloth)

This repository contains a LoRA adapter trained on a combined dataset of FinetuneMe and Orca using the Unsloth LLaMA 3.2B 4-bit model as the base.

It is intended to be used with the base model unsloth/llama-3.2-3b-instruct-unsloth-bnb-4bit.

🧠 Model Architecture

Base Model: unsloth/llama-3.2-3b-instruct-unsloth-bnb-4bit
Adapter Type: LoRA (Parameter-efficient fine-tuning)
Chat Format: LLaMA 3.1-style using get_chat_template(tokenizer, chat_template="llama-3.1")

🏋️‍♂️ Training Configuration

Max Steps: 600
Batch Size: 2 per device
Gradient Accumulation: 4 steps
Max Sequence Length: 2048 tokens
Learning Rate: 2e-4
Warmup Steps: 5
Optimizer: paged_adamw_8bit
Precision: Mixed (fp16 or bf16 based on GPU support)

🔎 Inference Instructions

To use this adapter, you must first load the base model and then apply this LoRA adapter on top.

from transformers import AutoTokenizer, AutoModelForCausalLM
from peft import PeftModel

# Load base model
base_model = "unsloth/llama-3.2-3b-instruct-unsloth-bnb-4bit"
model = AutoModelForCausalLM.from_pretrained(
    base_model,
    device_map="auto",
    trust_remote_code=True
)

# Load tokenizer
tokenizer = AutoTokenizer.from_pretrained(base_model)

# Load adapter (replace this with your actual repo link)
adapter_repo = "ajaypanigrahi/Llama-3.2-3B-instruct-finetunme-orca-lora-600steps"
model = PeftModel.from_pretrained(model, adapter_repo)

# Inference
prompt = "Write a Python function that returns the first 8 Fibonacci numbers."
inputs = tokenizer(prompt, return_tensors="pt").to("cuda")
outputs = model.generate(**inputs, max_new_tokens=100)
print(tokenizer.decode(outputs[0], skip_special_tokens=True))