DistilBERT Ticket Classifier (Distil_Bert_V3)

Model Overview

This is a fine-tuned DistilBERT model (distilbert-base-cased) designed to classify defect tickets and assign them to the appropriate team based on their text content. It cleaned the ticket data from Defect_ticket_V2.csv by fixing missing values input of ticket Description, Comment, and Summary, and predicts one of 5 team labels, each linked to a team email for automated routing.

  • Model Type: DistilBERT for Sequence Classification
  • Framework: PyTorch
  • Repository: ZAM-ITI-110/Distil_Bert_V3
  • License: MIT (see YAML metadata above)
  • Created: February 2025
  • Creator: AUNGHLAINGTUN/Student ID6319250G NYP

Intended Use

This model is intended for:

  • Automating ticket assignment in IT support or defect tracking systems.
  • Reducing manual triage time by predicting the responsible team based on ticket text.

Use Case

  • Input: A ticket with fields Description, Comment, and Summary (e.g., "Urgent server crash reported in production").
  • Output: A team label (0-4) mapped to a team email (e.g., team1@example.com).

Out of Scope

  • Not designed for multi-label classification or sentiment analysis.
  • May not generalize well to tickets outside the training domain (e.g., non-technical support tickets).

Training Data

  • Dataset: Defect_ticket_v2.csv (private dataset)
  • Size: Approximately 5,000 samples (70% train: ~3,504, 15% validation: ~750, 15% test: ~750).
  • Features: Combined text from Description, Comment, and Summary columns.
  • Labels: 5 unique team labels (encoded as 0-4), derived from the Assigned Team column.
  • Preprocessing: Missing values filled with empty strings; text truncated/padded to 512 tokens.

Note: The dataset is not publicly available due to privacy constraints.

Training Procedure

  • Base Model: distilbert-base-cased
  • Fine-Tuning:
    • Epochs: 5
    • Batch Size: 8
    • Optimizer: AdamW (learning rate: 3e-5, weight decay: 0.01)
    • Scheduler: Linear with 10% warmup steps
  • Hardware: Trained on Google Colab with a T4 GPU (~31 seconds/epoch).
  • Mixed Precision: Enabled via PyTorch AMP for efficiency.
  • Loss Function: CrossEntropyLoss

Training Metrics

Epoch Train Loss Validation Loss Validation Accuracy
1 0.4021 0.0038 100%
2 0.0031 0.0011 100%
3 0.0013 0.0006 100%
4 0.0008 0.0004 100%
5 0.0007 0.0004 100%
  • Test Accuracy: 100% (on ~750 test samples).

Evaluation

  • Performance: Achieved 100% accuracy on both validation and test sets, indicating excellent fit to the provided data.
  • Caveats:
    • Perfect accuracy may suggest an easy classification task, limited dataset diversity, or potential data leakage (e.g., duplicates across splits).
    • Real-world performance on new, unseen tickets should be validated.

How to Use

  • Predicts the appropriate team and email for up to 6 ticket descriptions.
  • Click 'Predict' for each ticket or then 'Send Tickets' to process for all .

Installation

pip install transformers torch 
Downloads last month
20
Inference Providers NEW
This model is not currently available via any of the supported Inference Providers.
The model cannot be deployed to the HF Inference API: The model has no library tag.

Space using ZAM-ITI-110/Distil_Bert_V3 1