File size: 1,980 Bytes
0aaf28c
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
---
license: mit
language:
- en
metrics:
- accuracy
- precision
- recall
- f1
pipeline_tag: text-classification
tags:
- NLP
- SentimentAnalysis
- LogisticRegression
- ScikitLearn
---
# ๐Ÿง  Sentiment Analysis with Logistic Regression

This model performs **multi-class sentiment analysis** on tweets, classifying them into the following categories:
- Positive
- Negative
- Neutral
- Irrelevant

It uses a custom preprocessing pipeline with:
<!-- - Text cleaning (URL, mention, hashtag, punctuation removal)-->
- CountVectorizer
- TF-IDF transformation
- Logistic Regression classifier (`max_iter=1000`)

---

## ๐Ÿ— Model Architecture

<!-- - **TextCleaner**: Custom scikit-learn transformer for consistent text preprocessing.-->
- **CountVectorizer**: Converts tweets into token count vectors.
- **TfidfTransformer**: Reweights tokens by importance.
- **LogisticRegression**: Interpretable and robust classification baseline.

---

## ๐Ÿงช Evaluation

Evaluated on a separate validation set of 999 tweets:

| Class       | Precision | Recall | F1-score |
|-------------|-----------|--------|----------|
| Irrelevant  | 0.88      | 0.85   | 0.87     |
| Negative    | 0.87      | 0.94   | 0.91     |
| Neutral     | 0.97      | 0.86   | 0.91     |
| Positive    | 0.89      | 0.94   | 0.91     |
| **Overall Accuracy** |        |        | **0.90**     |

---

## ๐Ÿ“ฆ Usage

```
python
import joblib

model = joblib.load("sentiment_model_lr.pkl")
user_input = "This update is surprisingly good!"

prediction = model.predict([user_input])
print(prediction[0])  # โ†’ Positive, Negative, etc.
```
---
```> โš ๏ธ Requires scikit-learn 1.6.1+ to avoid version mismatch warnings.```

---

## ๐Ÿ“š Dataset
```
Tweets were preprocessed using a clean_text routine and labeled into
the four sentiment categories. If youโ€™d like to experiment or re-train, contact
the author or fork this repo.
```

---
## ๐Ÿง‘โ€๐Ÿ’ป Author
```
Built by @arshvir Model version: 1.0 License: MIT
```

---