DistilBERT Ticket Classifier (Distil_Bert_V3)
Model Overview
This is a fine-tuned DistilBERT model (distilbert-base-cased
) designed to classify defect tickets and assign them to the appropriate team based on their text content. It cleaned the ticket data from Defect_ticket_V2.csv by fixing missing values input of ticket Description, Comment, and Summary, and predicts one of 5 team labels, each linked to a team email for automated routing.
- Model Type: DistilBERT for Sequence Classification
- Framework: PyTorch
- Repository: ZAM-ITI-110/Distil_Bert_V3
- License: MIT (see YAML metadata above)
- Created: February 2025
- Creator: AUNGHLAINGTUN/Student ID6319250G NYP
Intended Use
This model is intended for:
- Automating ticket assignment in IT support or defect tracking systems.
- Reducing manual triage time by predicting the responsible team based on ticket text.
Use Case
- Input: A ticket with fields
Description
,Comment
, andSummary
(e.g., "Urgent server crash reported in production"). - Output: A team label (0-4) mapped to a team email (e.g.,
team1@example.com
).
Out of Scope
- Not designed for multi-label classification or sentiment analysis.
- May not generalize well to tickets outside the training domain (e.g., non-technical support tickets).
Training Data
- Dataset:
Defect_ticket_v2.csv
(private dataset) - Size: Approximately 5,000 samples (70% train: ~3,504, 15% validation: ~750, 15% test: ~750).
- Features: Combined text from
Description
,Comment
, andSummary
columns. - Labels: 5 unique team labels (encoded as 0-4), derived from the
Assigned Team
column. - Preprocessing: Missing values filled with empty strings; text truncated/padded to 512 tokens.
Note: The dataset is not publicly available due to privacy constraints.
Training Procedure
- Base Model:
distilbert-base-cased
- Fine-Tuning:
- Epochs: 5
- Batch Size: 8
- Optimizer: AdamW (learning rate: 3e-5, weight decay: 0.01)
- Scheduler: Linear with 10% warmup steps
- Hardware: Trained on Google Colab with a T4 GPU (~31 seconds/epoch).
- Mixed Precision: Enabled via PyTorch AMP for efficiency.
- Loss Function: CrossEntropyLoss
Training Metrics
Epoch | Train Loss | Validation Loss | Validation Accuracy |
---|---|---|---|
1 | 0.4021 | 0.0038 | 100% |
2 | 0.0031 | 0.0011 | 100% |
3 | 0.0013 | 0.0006 | 100% |
4 | 0.0008 | 0.0004 | 100% |
5 | 0.0007 | 0.0004 | 100% |
- Test Accuracy: 100% (on ~750 test samples).
Evaluation
- Performance: Achieved 100% accuracy on both validation and test sets, indicating excellent fit to the provided data.
- Caveats:
- Perfect accuracy may suggest an easy classification task, limited dataset diversity, or potential data leakage (e.g., duplicates across splits).
- Real-world performance on new, unseen tickets should be validated.
How to Use
- Predicts the appropriate team and email for up to 6 ticket descriptions.
- Click 'Predict' for each ticket or then 'Send Tickets' to process for all .
Installation
pip install transformers torch
- Downloads last month
- 20
Inference Providers
NEW
This model is not currently available via any of the supported Inference Providers.
The model cannot be deployed to the HF Inference API:
The model has no library tag.