BanglaBERT Fine-tuned for Bangla Sentiment Analysis
Model Description
This model is a fine-tuned version of csebuetnlp/banglabert
on the SentiGOLD dataset for 5-class sentiment analysis in Bengali. It classifies text into:
- 😠 Very Negative (SN)
- 😞 Negative (WN)
- 😐 Neutral (N)
- 😊 Positive (WP)
- 😍 Very Positive (SP)
Key Features:
- State-of-the-art Bangla language understanding
- Handles both formal and informal Bengali text
- Optimized for social media, reviews, and customer feedback
- Requires text normalization using Bangla Normalizer
Intended Uses & Limitations
Primary Use
- Sentiment analysis of Bengali text
- Social media monitoring
- Customer feedback analysis
- Product review classification
Limitations
- Performance may degrade on code-mixed text (Bengali-English)
- May struggle with sarcasm and highly contextual expressions
- Best for short to medium-length texts (up to 512 tokens)
Training Data
The model was fine-tuned on SentiGOLD, the largest gold-standard Bangla sentiment analysis dataset:
Feature | Value |
---|---|
Total Samples | 70,000 |
Domains Covered | 30+ |
Source Diversity | Social media, news, blogs, reviews |
Class Distribution | Balanced across 5 classes |
Annotation Quality | Fleiss' kappa = 0.88 |
Training Procedure
Hyperparameters
Parameter | Value |
---|---|
Learning Rate | 2e-5 → 1.05e-6 |
Batch Size | 48 |
Epochs | 5 |
Optimizer | AdamW |
Scheduler | ReduceLROnPlateau |
Weight Decay | 0.01 |
Gradient Accumulation | 4 steps |
Warmup Ratio | 5% |
Techniques
- Class-weighted loss handling imbalance
- Early stopping (patience=3)
- Mixed precision (FP16) training
- Gradient checkpointing
- Text normalization using Bangla Normalizer
Evaluation Results
Validation Performance
Epoch | F1 (Macro) | Accuracy | Very Neg F1 | Neg F1 | Neu F1 | Pos F1 | Very Pos F1 |
---|---|---|---|---|---|---|---|
1 | 0.6334 | 0.6331 | 0.6789 | 0.5834 | 0.6407 | 0.5635 | 0.7004 |
5 | 0.6537 | 0.6551 | 0.7081 | 0.6157 | 0.6421 | 0.5789 | 0.7236 |
Final Test Performance
Metric | Score |
---|---|
Macro F1 | 0.6660 |
Accuracy | 0.6671 |
How to Use
Direct Inference
from transformers import pipeline
from normalizer import normalize
# Load model
classifier = pipeline(
"text-classification",
model="ahs95/banglabert-sentiment-analysis",
tokenizer="ahs95/banglabert-sentiment-analysis"
)
# Prepare text
text = "আপনার পণ্যটি অসাধারণ! আমি খুবই সন্তুষ্ট।"
normalized_text = normalize(text) # Important for BanglaBERT
# Classify
result = classifier(normalized_text)
print(f"Sentiment: {result[0]['label']} (Confidence: {result[0]['score']:.2f})")
Advanced Usage
from transformers import AutoTokenizer, AutoModelForSequenceClassification
import torch
from normalizer import normalize
# Load model and tokenizer
model_name = "ahs95/banglabert-sentiment-analysis"
tokenizer = AutoTokenizer.from_pretrained(model_name)
model = AutoModelForSequenceClassification.from_pretrained(model_name)
# Prepare inputs
texts = [
"সেবা খুব খারাপ ছিল। আমি কখনো ফিরে আসব না।",
"পণ্যটির গুণগত মান মোটামুটি ভাল"
]
normalized_texts = [normalize(t) for t in texts]
# Tokenize and predict
inputs = tokenizer(normalized_texts, padding=True, truncation=True, return_tensors="pt")
with torch.no_grad():
outputs = model(**inputs)
probabilities = torch.nn.functional.softmax(outputs.logits, dim=-1)
# Get predictions
sentiment_labels = ["Very Negative", "Negative", "Neutral", "Positive", "Very Positive"]
predictions = [sentiment_labels[p] for p in probabilities.argmax(dim=1)]
for text, pred in zip(texts, predictions):
print(f"Text: {text}\nPredicted Sentiment: {pred}\n")
Ethical Considerations
Bias: While SentiGOLD reduces bias through synthetic data, real-world validation is recommended
Use Cases: Suitable for:
- Product feedback analysis
- Social media monitoring
- Market research
- Avoid: Critical decision systems without human oversight
Citation
If you use this model, please cite:
@misc{banglabert-sentiment,
author = {Arshadul Hoque},
title = {Fine-tuned BanglaBERT for Bengali Sentiment Analysis},
year = {2025},
publisher = {Hugging Face},
howpublished = {\url{https://huggingface.co/ahs95/banglabert-sentiment-analysis}}
}
Contact
For questions and support: [email protected]
- Downloads last month
- 24
Model tree for ahs95/banglabert-sentiment-analysis
Base model
csebuetnlp/banglabert