BanglaBERT Fine-tuned for Bangla Sentiment Analysis

Model Description

This model is a fine-tuned version of csebuetnlp/banglabert on the SentiGOLD dataset for 5-class sentiment analysis in Bengali. It classifies text into:

  1. 😠 Very Negative (SN)
  2. 😞 Negative (WN)
  3. 😐 Neutral (N)
  4. 😊 Positive (WP)
  5. 😍 Very Positive (SP)

Key Features:

  • State-of-the-art Bangla language understanding
  • Handles both formal and informal Bengali text
  • Optimized for social media, reviews, and customer feedback
  • Requires text normalization using Bangla Normalizer

Intended Uses & Limitations

Primary Use

  • Sentiment analysis of Bengali text
  • Social media monitoring
  • Customer feedback analysis
  • Product review classification

Limitations

  • Performance may degrade on code-mixed text (Bengali-English)
  • May struggle with sarcasm and highly contextual expressions
  • Best for short to medium-length texts (up to 512 tokens)

Training Data

The model was fine-tuned on SentiGOLD, the largest gold-standard Bangla sentiment analysis dataset:

Feature Value
Total Samples 70,000
Domains Covered 30+
Source Diversity Social media, news, blogs, reviews
Class Distribution Balanced across 5 classes
Annotation Quality Fleiss' kappa = 0.88

Training Procedure

Hyperparameters

Parameter Value
Learning Rate 2e-5 → 1.05e-6
Batch Size 48
Epochs 5
Optimizer AdamW
Scheduler ReduceLROnPlateau
Weight Decay 0.01
Gradient Accumulation 4 steps
Warmup Ratio 5%

Techniques

  • Class-weighted loss handling imbalance
  • Early stopping (patience=3)
  • Mixed precision (FP16) training
  • Gradient checkpointing
  • Text normalization using Bangla Normalizer

Evaluation Results

Validation Performance

Epoch F1 (Macro) Accuracy Very Neg F1 Neg F1 Neu F1 Pos F1 Very Pos F1
1 0.6334 0.6331 0.6789 0.5834 0.6407 0.5635 0.7004
5 0.6537 0.6551 0.7081 0.6157 0.6421 0.5789 0.7236

Final Test Performance

Metric Score
Macro F1 0.6660
Accuracy 0.6671

How to Use

Direct Inference

from transformers import pipeline
from normalizer import normalize

# Load model
classifier = pipeline(
    "text-classification", 
    model="ahs95/banglabert-sentiment-analysis",
    tokenizer="ahs95/banglabert-sentiment-analysis"
)

# Prepare text
text = "আপনার পণ্যটি অসাধারণ! আমি খুবই সন্তুষ্ট।"
normalized_text = normalize(text)  # Important for BanglaBERT

# Classify
result = classifier(normalized_text)
print(f"Sentiment: {result[0]['label']} (Confidence: {result[0]['score']:.2f})")

Advanced Usage


from transformers import AutoTokenizer, AutoModelForSequenceClassification
import torch
from normalizer import normalize

# Load model and tokenizer
model_name = "ahs95/banglabert-sentiment-analysis"
tokenizer = AutoTokenizer.from_pretrained(model_name)
model = AutoModelForSequenceClassification.from_pretrained(model_name)

# Prepare inputs
texts = [
    "সেবা খুব খারাপ ছিল। আমি কখনো ফিরে আসব না।",
    "পণ্যটির গুণগত মান মোটামুটি ভাল"
]
normalized_texts = [normalize(t) for t in texts]

# Tokenize and predict
inputs = tokenizer(normalized_texts, padding=True, truncation=True, return_tensors="pt")
with torch.no_grad():
    outputs = model(**inputs)
    probabilities = torch.nn.functional.softmax(outputs.logits, dim=-1)

# Get predictions
sentiment_labels = ["Very Negative", "Negative", "Neutral", "Positive", "Very Positive"]
predictions = [sentiment_labels[p] for p in probabilities.argmax(dim=1)]

for text, pred in zip(texts, predictions):
    print(f"Text: {text}\nPredicted Sentiment: {pred}\n")

Ethical Considerations

  • Bias: While SentiGOLD reduces bias through synthetic data, real-world validation is recommended

  • Use Cases: Suitable for:

    • Product feedback analysis
    • Social media monitoring
    • Market research
    • Avoid: Critical decision systems without human oversight

Citation

If you use this model, please cite:

@misc{banglabert-sentiment,
  author = {Arshadul Hoque},
  title = {Fine-tuned BanglaBERT for Bengali Sentiment Analysis},
  year = {2025},
  publisher = {Hugging Face},
  howpublished = {\url{https://huggingface.co/ahs95/banglabert-sentiment-analysis}}
}

Contact

For questions and support: [email protected]

Downloads last month
24
Safetensors
Model size
111M params
Tensor type
F32
·
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for ahs95/banglabert-sentiment-analysis

Finetuned
(18)
this model

Dataset used to train ahs95/banglabert-sentiment-analysis

Spaces using ahs95/banglabert-sentiment-analysis 2