license: apache-2.0
base_model: google/gemma-3-4b-it
dataset: bitext/Bitext-customer-support-llm-chatbot-training-dataset
tags:
- gemma3
- instruction-tuned
- customer-support
- chatbot
- unsloth
- text-generation
- llm language: en
- pipeline_tag: text-generation
Nolan-Robbins/unsloth-gemma-3-4B-customer-support
This model is a fine-tuned version of Google's gemma-3-4b-it specifically adapted for customer support conversations. It was fine-tuned by Nolan Robbins using the Unsloth library for efficient LoRA training on the bitext/Bitext-customer-support-llm-chatbot-training-dataset.
Model Details
- Model Type: Causal Language Model (Decoder-only transformer)
- Base Model: google/gemma-3-4b-it
- Fine-tuning Library: Unsloth
- Fine-tuning Dataset: bitext/Bitext-customer-support-llm-chatbot-training-dataset
- Language(s): Primarily English (due to the fine-tuning dataset)
- License: Apache 2.0 (inherited from the base model and consistent with many fine-tuning practices. The fine-tuning dataset bitext/Bitext-customer-support-llm-chatbot-training-dataset uses cdla-sharing-1.0).
Model Description
Nolan-Robbins/unsloth-gemma-3-4B-customer-support is a generative language model fine-tuned from Google's Gemma 3 4B instruction-tuned model. The Gemma 3 models are lightweight, state-of-the-art open models built from the same research and technology used to create the Gemini models.
This version has been specifically fine-tuned to enhance its capabilities in understanding and generating responses relevant to customer support scenarios. The fine-tuning process leveraged LoRA (Low-Rank Adaptation) for parameter-efficient training, made significantly faster and more memory-efficient by the Unsloth library.
The goal of this fine-tuning is to provide a model that is more adept at:
- Understanding customer inquiries and intents.
- Providing helpful and relevant support information.
- Generating polite and contextually appropriate responses in a customer service setting.
- Answering questions based on typical customer support topics like order tracking, account management, billing issues, and product information.
Intended Uses
This model is intended for use in various customer support applications, including but not limited to:
- Customer Service Chatbots: Powering conversational interfaces to answer customer questions, provide assistance, and guide users through support processes.
- Support Ticket Triage & Response Suggestion: Assisting human agents by categorizing incoming tickets or suggesting potential responses.
- FAQ Answering: Providing answers to frequently asked questions based on the knowledge embedded during fine-tuning.
- Drafting Customer Communications: Helping to generate drafts for emails or messages related to common customer support issues.
Example Prompts:
- "I want to track my recent order."
- "How can I reset my password?"
- "I'm having an issue with my bill, it seems higher than expected."
- "What is your return policy?"
- "Can you help me update my shipping address?"
How to Use
This model can be used with the Hugging Face Transformers library. Given it was fine-tuned with Unsloth and likely uses LoRA adapters, you'll typically load the base model and then apply the adapters if they are not merged. If the model has been merged and saved as a full fine-tuned model, the process is simpler.
Using transformers (assuming merged model or Unsloth handles adapter loading transparently):
from transformers import AutoModelForCausalLM, AutoTokenizer
import torch
model_id = "Nolan-Robbins/unsloth-gemma-3-4B-customer-support"
tokenizer = AutoTokenizer.from_pretrained(model_id)
model = AutoModelForCausalLM.from_pretrained(
model_id,
torch_dtype=torch.bfloat16, # Or torch.float16, depending on your hardware
device_map="auto"
)
Use the chat template (important for instruction-tuned models)
- For Gemma 3, the specific chat template might vary slightly from Gemma 2.
- It's crucial to use the one associated with the model/tokenizer.
- The
apply_chat_template
method handles this. - Example based on typical instruction-tuned model usage:
chat = [
{ "role": "user", "content": "Hi, I'd like to know how to cancel my subscription." }
]
prompt = tokenizer.apply_chat_template(chat, tokenize=False, add_generation_prompt=True)
inputs = tokenizer.encode(prompt, add_special_tokens=False, return_tensors="pt").to(model.device)
Gemma 3 recommended generation parameters
- (These are general recommendations; adjust as needed for your specific task)
outputs = model.generate(
inputs,
max_new_tokens=256,
temperature=0.7, # Adjust for creativity vs. factuality
top_p=0.95,
top_k=50,
do_sample=True
)
response_ids = outputs[0][inputs.shape[-1]:] # Get only the generated tokens
response = tokenizer.decode(response_ids, skip_special_tokens=True)
print(f"User: {chat[0]['content']}")
print(f"Assistant: {response}")
Note on Unsloth and LoRA:
If you are loading LoRA adapters separately with Unsloth, the loading process might look slightly different, involving FastLanguageModel
from Unsloth. Refer to the Unsloth GitHub for specific instructions on loading models fine-tuned with their library if the above standard Hugging Face method doesn't automatically apply adapters. For a model uploaded to the Hub after being merged, the standard method should generally work.
If adapters are separate, the process with Unsloth might look like:
from unsloth import FastLanguageModel
model, tokenizer = FastLanguageModel.from_pretrained(
model_name = "Nolan-Robbins/unsloth-gemma-3-4B-customer-support",
max_seq_length = YOUR_MAX_SEQ_LENGTH,
dtype = YOUR_DTYPE,
load_in_4bit = YOUR_LOAD_IN_4BIT_SETTING,
)
For inference, you might not need to specify all training params again if loading from Hub.
- Check Unsloth documentation for precise inference loading of adapter-only models.
Training Data
This model was fine-tuned on the bitext/Bitext-customer-support-llm-chatbot-training-dataset.
- Dataset Description: This dataset is designed for training Large Language Models for customer service applications. It contains 26,872 question/answer pairs covering 27 common customer service intents (e.g., cancel_order, track_order, payment_issue, contact_human_agent) across 10 categories. The dataset includes 30 entity/slot types and has approximately 3.57 million tokens.
- Language: English
- Format: CSV, structured with 'instruction' and 'response' columns (or similar, e.g., 'flags', 'instruction', 'category', 'intent', 'response'). The fine-tuning process formatted these into a conversational structure suitable for Gemma's chat template.
Training Procedure
The model was fine-tuned from google/gemma-3-4b-it using the Unsloth library, which enables significantly faster and more memory-efficient training, particularly for LoRA.
- Preprocessing: The training data was formatted to follow Gemma's chat template structure. A typical format might involve:
<start_of_turn>user
{user_message}<end_of_turn>
<start_of_turn>model
{assistant_message}<end_of_turn>
The bitext dataset's 'instruction' and 'response' fields were mapped to this conversational format.
- Fine-tuning: LoRA (Low-Rank Adaptation) was employed to update a small fraction of the model's parameters, making the fine-tuning process efficient.
- Key Hyperparameters I used:
- LoRA rank (r): 8
- LoRA alpha (lora_alpha): 8
- Learning rate: e.g., 2e-4
- Number of epochs: e.g., 1
- Batch size: e.g., 16
- Optimizer: adamw_8bit
- Key Hyperparameters I used:
- Software: Python, PyTorch, Hugging Face Transformers, Datasets, and Unsloth.
- Additional Libraries: trl, wandb
Evaluation
Quantitative evaluation results (e.g., perplexity on a customer support test set, ROUGE scores, or intent accuracy) are not yet provided. Qualitative evaluation through direct interaction and example prompts demonstrates enhanced proficiency in customer support-related queries compared to the base model.
Further evaluation could involve:
- Benchmarking against standard NLP tasks relevant to customer support (e.g., MMLU subset related to business/customer service, custom test sets).
- Human evaluation of response quality, relevance, helpfulness, and tone in simulated customer interactions.
Limitations and Bias
This model inherits most limitations and biases from the base google/gemma-3-4b-it model. Fine-tuning on the Bitext-customer-support-llm-chatbot-training-dataset introduces further specific considerations:
- Domain Specificity: While fine-tuned for customer support, the model's knowledge is primarily limited to the patterns and information present in the fine-tuning dataset. It may not perform as well on queries outside of common customer service topics or if specific company/product knowledge is required that wasn't in the training data.
- Data Bias: The Bitext dataset, while comprehensive, may contain its own biases (e.g., towards certain types of customer interactions, phrasing, or resolutions). The model may reflect these biases. For example, if the dataset predominantly features polite customer interactions, the model might struggle with or respond suboptimally to aggressive or unusual user inputs.
- Hallucination: Like all LLMs, this model can generate incorrect or nonsensical information (hallucinate), especially for queries that are ambiguous or require real-world knowledge not present in its training. In a customer support context, providing incorrect information can be particularly problematic.
- Repetitive Responses: For certain inputs or after long conversations, the model might become repetitive.
- Safety: The base Gemma models undergo safety filtering. However, fine-tuning can sometimes alter these characteristics. While the customer support dataset is generally benign, it's crucial to test for any unintended safety concerns. The model should not be used to generate harmful, unethical, or highly sensitive content.
- No Real-Time Knowledge: The model has no access to real-time information or customer-specific data unless explicitly provided in the prompt. It cannot access live account details, order statuses, etc., from backend systems.
Ethical Considerations and Risks
- Misinformation: The risk of providing incorrect or misleading information to customers is significant. Responses should be validated, especially when dealing with sensitive issues like billing, personal data, or critical services.
- Over-Reliance: Users might overly rely on the model for tasks requiring human judgment or empathy. It should be used as a tool to augment human support agents, not entirely replace them in complex or sensitive situations.
- Job Displacement: The deployment of advanced AI in customer support roles raises concerns about job displacement for human agents. Ethical deployment should consider how AI can assist and empower human workers.
- Privacy and Data Security: When interacting with the model, ensure no personally identifiable information (PII) or sensitive customer data is inadvertently logged or stored unless compliant with data protection regulations (e.g., GDPR, CCPA). The model itself doesn't retain data from individual conversations post-inference, but the infrastructure around it must be secure.
- Bias Amplification: If the training data contains biases (e.g., in how different customer demographics are addressed), the model might perpetuate or even amplify these biases.
Recommendations:
- Thoroughly test the model in a sandboxed environment before deploying to live customer interactions.
- Implement a human-in-the-loop system for reviewing and correcting responses, especially for critical queries.
- Be transparent with users that they are interacting with an AI.
- Continuously monitor the model's performance and gather user feedback for improvement.
- Do not use this model for financial, medical, or legal advice, or any high-stakes decisions without human oversight.
Environmental Impact
Fine-tuning existing large language models, especially using parameter-efficient methods like LoRA, is significantly less resource-intensive than training a model from scratch. The Unsloth library further optimizes this process for speed and memory, reducing the computational cost.
- Base Model Training: The environmental impact of training the base google/gemma-3-4b-it model was substantial. Google provides information on its sustainable computing efforts.
- Fine-tuning:
- Hardware Type: Nvidia A100 GPU
- Hours used: Approx. 30 mins
- Cloud Provider: Colab Pro
Citation
Base Model (Gemma):
Please refer to the citation guidelines provided by Google for the Gemma models on their Hugging Face page or the official Gemma/Gemini documentation. A general citation might look like:
@misc{gemma_google_2024,
author = {Google and Google DeepMind},
title = {Gemma: Open Models Based on Gemini Research and Technology},
year = {2024},
publisher = {Google},
url = {\[https://huggingface.co/google/gemma-3-4b-it\](https://huggingface.co/google/gemma-3-4b-it)}
}
Fine-tuning Dataset:
@misc{bitext_customer_support_dataset_2023,
author = {Bitext},
title = {Bitext Customer Support LLM Chatbot Training Dataset},
year = {2023},
publisher = {Hugging Face},
version = {1.0.0},
url = {\[https://huggingface.co/datasets/bitext/Bitext-customer-support-llm-chatbot-training-dataset\](https://huggingface.co/datasets/bitext/Bitext-customer-support-llm-chatbot-training-dataset)}
}
This Fine-tuned Model:
@misc{robbins_unsloth_gemma_3_4b_customer_support_2024,
author = {Nolan Robbins},
title = {unsloth-gemma-3-4B-customer-support: Fine-tuned Gemma 3 4B for Customer Support},
year = {2024},
publisher = {Hugging Face},
url = {\[https://huggingface.co/Nolan-Robbins/unsloth-gemma-3-4B-customer-support\](https://huggingface.co/Nolan-Robbins/unsloth-gemma-3-4B-customer-support)}
}
Unsloth (if desired):
Refer to the Unsloth GitHub repository (https://github.com/unslothai/unsloth) for any preferred citation format. Generally, citing the software by linking to the repository is appreciated.
Model Card Authors: Nolan Robbins
Model Card Contact:
LinkedIn
Medium