DistilBERT Ticket Classifier (Distil_Bert_V3)

Model Overview

This is a fine-tuned DistilBERT model (distilbert-base-cased) designed to classify defect tickets and assign them to the appropriate team based on their text content. It cleaned the ticket data from Defect_ticket_V2.csv by fixing missing values input of ticket Description, Comment, and Summary, and predicts one of 5 team labels, each linked to a team email for automated routing.

Model Type: DistilBERT for Sequence Classification
Framework: PyTorch
Repository: ZAM-ITI-110/Distil_Bert_V3
License: MIT (see YAML metadata above)
Created: February 2025
Creator: AUNGHLAINGTUN/Student ID6319250G NYP

Intended Use

This model is intended for:

Automating ticket assignment in IT support or defect tracking systems.
Reducing manual triage time by predicting the responsible team based on ticket text.

Use Case

Input: A ticket with fields Description, Comment, and Summary (e.g., "Urgent server crash reported in production").
Output: A team label (0-4) mapped to a team email (e.g., team1@example.com).

Out of Scope

Not designed for multi-label classification or sentiment analysis.
May not generalize well to tickets outside the training domain (e.g., non-technical support tickets).

Training Data

Dataset: Defect_ticket_v2.csv (private dataset)
Size: Approximately 5,000 samples (70% train: ~3,504, 15% validation: ~750, 15% test: ~750).
Features: Combined text from Description, Comment, and Summary columns.
Labels: 5 unique team labels (encoded as 0-4), derived from the Assigned Team column.
Preprocessing: Missing values filled with empty strings; text truncated/padded to 512 tokens.

Note: The dataset is not publicly available due to privacy constraints.

Training Procedure

Base Model: distilbert-base-cased
Fine-Tuning:
- Epochs: 5
- Batch Size: 8
- Optimizer: AdamW (learning rate: 3e-5, weight decay: 0.01)
- Scheduler: Linear with 10% warmup steps
Hardware: Trained on Google Colab with a T4 GPU (~31 seconds/epoch).
Mixed Precision: Enabled via PyTorch AMP for efficiency.
Loss Function: CrossEntropyLoss

Training Metrics

Epoch	Train Loss	Validation Loss	Validation Accuracy
1	0.4021	0.0038	100%
2	0.0031	0.0011	100%
3	0.0013	0.0006	100%
4	0.0008	0.0004	100%
5	0.0007	0.0004	100%

Test Accuracy: 100% (on ~750 test samples).

Evaluation

Performance: Achieved 100% accuracy on both validation and test sets, indicating excellent fit to the provided data.
Caveats:
- Perfect accuracy may suggest an easy classification task, limited dataset diversity, or potential data leakage (e.g., duplicates across splits).
- Real-world performance on new, unseen tickets should be validated.

How to Use

Predicts the appropriate team and email for up to 6 ticket descriptions.
Click 'Predict' for each ticket or then 'Send Tickets' to process for all .

Installation

pip install transformers torch

ZAM-ITI-110
/

Distil_Bert_V3