metadata
license: mit
datasets:
- mlabonne/FineTome-100k
- microsoft/orca-math-word-problems-200k
language:
- en
base_model:
- unsloth/Llama-3.2-3B-Instruct-unsloth-bnb-4bit
library_name: transformers
π§ LoRA Adapter: Fine-tuned LLaMA 3.2B on FinetuneMe + Orca (4-bit, Unsloth)
This repository contains a LoRA adapter trained on a combined dataset of FinetuneMe and Orca using the Unsloth LLaMA 3.2B 4-bit model as the base.
It is intended to be used with the base model unsloth/llama-3.2-3b-instruct-unsloth-bnb-4bit
.
π§ Model Architecture
- Base Model:
unsloth/llama-3.2-3b-instruct-unsloth-bnb-4bit
- Adapter Type: LoRA (Parameter-efficient fine-tuning)
- Chat Format: LLaMA 3.1-style using
get_chat_template(tokenizer, chat_template="llama-3.1")
ποΈββοΈ Training Configuration
- Max Steps: 600
- Batch Size: 2 per device
- Gradient Accumulation: 4 steps
- Max Sequence Length: 2048 tokens
- Learning Rate: 2e-4
- Warmup Steps: 5
- Optimizer:
paged_adamw_8bit
- Precision: Mixed (fp16 or bf16 based on GPU support)
π Inference Instructions
To use this adapter, you must first load the base model and then apply this LoRA adapter on top.
from transformers import AutoTokenizer, AutoModelForCausalLM
from peft import PeftModel
# Load base model
base_model = "unsloth/llama-3.2-3b-instruct-unsloth-bnb-4bit"
model = AutoModelForCausalLM.from_pretrained(
base_model,
device_map="auto",
trust_remote_code=True
)
# Load tokenizer
tokenizer = AutoTokenizer.from_pretrained(base_model)
# Load adapter (replace this with your actual repo link)
adapter_repo = "ajaypanigrahi/Llama-3.2-3B-instruct-finetunme-orca-lora-600steps"
model = PeftModel.from_pretrained(model, adapter_repo)
# Inference
prompt = "Write a Python function that returns the first 8 Fibonacci numbers."
inputs = tokenizer(prompt, return_tensors="pt").to("cuda")
outputs = model.generate(**inputs, max_new_tokens=100)
print(tokenizer.decode(outputs[0], skip_special_tokens=True))