Model Card
Overview
This repository contains a LoRA-fine-tuned version of Meta's Llama-3.2-3B-Instruct model, trained using PEFT (LoRA) on a custom bank customer-service FAQ dataset for question-answering. The adapter weights, configuration, and tokenizer files are included for seamless inference via a single from_pretrained
call.
Model Details
Model Description
Base model:
meta-llama/Llama-3.2-3B-Instruct
Method: PEFT (LoRA)
LoRA configuration:
- Rank (r): 8
- Alpha: 32
- Dropout: 0.05
- Target modules:
q_proj
,v_proj
Task: Customer-service question answering on banking FAQs
Metadata
- Developed by: Sardar Taimoor
- Finetuned by: SardarTaimoor (https://huggingface.co/SardarTaimoor)
- Model type: Causal Language Model
- Language(s): English
- License: MIT
- Finetuned from:
meta-llama/Llama-3.2-3B-Instruct
Links
- Model repo: https://huggingface.co/SardarTaimoor/llama3b-lora
- Base model: https://huggingface.co/meta-llama/Llama-3.2-3B-Instruct
- Training notebook: Fine-Tuning.ipynb
How to Use this Model
import torch
from transformers import AutoModelForCausalLM, AutoTokenizer
model_id = "SardarTaimoor/llama3b-lora"
tokenizer = AutoTokenizer.from_pretrained(model_id)
model = AutoModelForCausalLM.from_pretrained(
model_id,
device_map="auto", # splits layers across GPU/CPU
torch_dtype=torch.float16, # half-precision on GPU
low_cpu_mem_usage=True # avoids fully materializing everything in host RAM
)
inputs = tokenizer("What's the Little Champs account?", return_tensors="pt").to(model.device)
out = model.generate(**inputs, max_new_tokens=50, do_sample=False)
print(tokenizer.decode(out[0], skip_special_tokens=True))
print("-" * 60)
Training Details
Data
- Dataset description: Custom customer-service FAQ dataset for bank products, formatted as JSONL with user prompts and assistant completions.
- Number of examples: 319 total (train: ~303, validation: ~16 after a 5% split)
- Preprocessing steps: Prompts and completions extracted and cleaned from JSONL; tokenization via the original Llama tokenizer.
Procedure
- Compute environment: Google Colab T4 GPU, Python 3
- Epochs: 20
- Batch size: 4 per device (gradient accumulation steps = 8)
- Learning rate: 2e-5
- Precision: fp16
Evaluation & Metrics
Evaluation dataset: 5% holdout from the custom FAQ dataset (~16 examples)
Metrics: BLEU, ROUGE, BERTScore
Results:
- BLEU: 0.0146
- ROUGE: rouge1=0.1083, rouge2=0.0281, rougeL=0.0816
- BERTScore (mean f1): 0.8211
Limitations & Biases
- Known limitations: May hallucinate rare banking details; domain-restricted to the provided FAQ data.
- Potential biases: Reflects biases present in original Llama and the customer-service samples.
License
This model is released under the MIT license. See LICENSE for details.
For questions or contributions, please open an issue on the model repo.
Inference Providers
NEW
This model isn't deployed by any Inference Provider.
๐
Ask for provider support
Model tree for SardarTaimoor/llama3b-lora
Base model
meta-llama/Llama-3.2-3B-Instruct