YAML Metadata
Warning:
empty or missing yaml metadata in repo card
(https://huggingface.co/docs/hub/model-cards#model-card-metadata)
DistilBERT-Base-Uncased Quantized Model for Emotion Detection on Twitter posts
This repository hosts a quantized version of the DistilBERT model, fine-tuned for Emotion Detection using the Emotion Dataset (Crowdflower / Dair AI) . The model has been optimized using FP16 quantization for efficient deployment without significant accuracy loss.
Model Details
- Model Architecture: DistilBERT Base Uncased
- Task: Multi-class classification
- Dataset: Emotion Dataset (Crowdflower / Dair AI)
- Quantization: Float16
- Fine-tuning Framework: Hugging Face Transformers
Installation
pip install datasets transformers evaluate scikit-learn
Loading the Model
from transformers import AutoModelForSequenceClassification, AutoTokenizer
import torch
# Load tokenizer and model
model_path = "distilbert-base-uncased"
tokenizer = AutoTokenizer.from_pretrained(model_path)
model = AutoModelForSequenceClassification.from_pretrained(model_path)
# Define test sentences
texts = [
"I am so excited about my vacation next week!",
"I'm feeling really sad and hopeless right now.",
"That movie made me laugh so hard!",
"I'm nervous about the job interview tomorrow.",
"Why does everything feel so frustrating lately?",
"I can't believe I won! This is amazing!",
"I miss you so much it's unbearable.",
"That was such a disgusting experience.",
"I'm scared of what might happen next.",
"This is the best day of my life!"
]
# Tokenize and predict
def predict_emotion(text, model, tokenizer, device):
"""
Predict the emotion of a given text using a loaded model and tokenizer.
Args:
text (str): Input sentence.
model: Pretrained transformer model.
tokenizer: Corresponding tokenizer.
device: 'cuda' or 'cpu'
Returns:
dict: Top predicted label and score.
"""
model.eval()
inputs = tokenizer(text, return_tensors="pt", truncation=True, padding=True).to(device)
with torch.no_grad():
outputs = model(**inputs)
logits = outputs.logits
probabilities = F.softmax(logits, dim=-1).cpu().numpy()[0]
predicted_class = torch.argmax(logits, dim=-1).item()
label_names = dataset["train"].features["label"].names
predicted_label = label_names[predicted_class]
return {
"label": predicted_label,
"confidence": round(probabilities[predicted_class] * 100, 2)
}
Performance Metrics
- Accuracy: 0.940500
- Precision: 0.940811
- Recall: 0.940500
- F1 Score: 0.940539
Fine-Tuning Details
Dataset
The dair-ai/emotion dataset contains 20,000 English text samples (mostly tweets) labeled with one of six basic emotions: joy, sadness, anger, fear, love, and surprise. It is widely used for training and evaluating emotion classification models in NLP.
Training
- Epochs: 3
- Batch size: 16
- Learning rate: 2e-5
- Evaluation strategy:
epoch
Quantization
Post-training quantization was applied using PyTorchβs half()
precision (FP16) to reduce model size and inference time.
Repository Structure
.
βββ quantized-model/ # Contains the quantized model files
β βββ config.json
β βββ model.safetensors
β βββ tokenizer_config.json
β βββ vocab.txt
β βββ special_tokens_map.json
β βββ tokenizer.json
βββ README.md # Model documentation
Limitations
- The model is trained specifically for multi label classification on twitter posts.
- FP16 quantization may result in slight numerical instability in edge cases.
Contributing
Feel free to open issues or submit pull requests to improve the model or documentation.
- Downloads last month
- 3
Inference Providers
NEW
This model isn't deployed by any Inference Provider.
π
Ask for provider support