arjahojnik commited on
Commit
12e1f98
·
verified ·
1 Parent(s): 2b4eec5

Create README.md

Browse files
Files changed (1) hide show
  1. README.md +98 -0
README.md ADDED
@@ -0,0 +1,98 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ ---
2
+ datasets:
3
+ - nhull/tripadvisor-split-dataset-v2
4
+ tags:
5
+ - nlp
6
+ - hotels
7
+ - reviews
8
+ - sentiment-analysis
9
+ - rnn
10
+ - deep-learning
11
+ ---
12
+ # LSTM Sentiment Analysis Model
13
+
14
+ This is a fine-tuned LSTM model trained for sentiment analysis on the TripAdvisor dataset. The model predicts sentiment scores on a scale of 1 to 5 based on review text.
15
+
16
+ - **Base Model**: Custom one-layer LSTM
17
+ - **Dataset**: `nhull/tripadvisor-split-dataset-v2`
18
+ - **Use Case**: Sentiment classification for customer reviews to understand customer satisfaction.
19
+ - **Output**: Sentiment labels (1–5)
20
+
21
+ ___
22
+
23
+ ## Model Details
24
+
25
+ - **Embedding**: 100-dimensional pre-trained GloVe embeddings
26
+ - **Learning Rate**: 3e-04
27
+ - **Batch Size**: 64
28
+ - **Epochs**: 20 (early stopping with patience = 3)
29
+ - **Dropout**: 0.2
30
+ - **Tokenizer**: Custom tokenizer (vocabulary size: 10,000)
31
+ - **Framework**: TensorFlow/Keras
32
+
33
+ ___
34
+
35
+ ## Intended Use
36
+
37
+ This model is designed to classify hotel reviews based on their sentiment. It assigns a star rating between 1 and 5 to a review, indicating the sentiment expressed in the review.
38
+
39
+ ___
40
+
41
+ ## Dataset
42
+
43
+ The dataset used for training, validation, and testing is nhull/tripadvisor-split-dataset-v2. It consists of:
44
+
45
+ - **Training Set**: 30,400 reviews
46
+ - **Validation Set**: 1,600 reviews
47
+ - **Test Set**: 8,000 reviews
48
+
49
+ All splits are balanced across five sentiment labels.
50
+
51
+ ___
52
+
53
+ ### Test Performance
54
+
55
+ | Metric | Value |
56
+ |------------|--------|
57
+ | Accuracy | 0.6041 |
58
+ | Precision | 0.60 |
59
+ | Recall | 0.60 |
60
+ | F1-Score | 0.60 |
61
+
62
+ #### Classification Report (Test Set)
63
+
64
+ | Label | Precision | Recall | F1-Score | Support |
65
+ |-------|-----------|--------|----------|---------|
66
+ | 1 | 0.72 | 0.73 | 0.73 | 1600 |
67
+ | 2 | 0.52 | 0.52 | 0.52 | 1600 |
68
+ | 3 | 0.57 | 0.48 | 0.52 | 1600 |
69
+ | 4 | 0.52 | 0.45 | 0.49 | 1600 |
70
+ | 5 | 0.65 | 0.84 | 0.73 | 1600 |
71
+
72
+ ### Confusion Matrix (Test Set)
73
+
74
+ | True \\ Predicted | 1 | 2 | 3 | 4 | 5 |
75
+ |-------------------|------|------|------|------|------|
76
+ | **1** | 1168 | 1222 | 1286 | 1334 | 710 |
77
+ | **2** | 1200 | 832 | 1600 | 1648 | 1024 |
78
+ | **3** | 1011 | 1347 | 768 | 1459 | 835 |
79
+ | **4** | 1096 | 1432 | 1496 | 720 | 920 |
80
+ | **5** | 1155 | 1491 | 1555 | 1603 | 1344 |
81
+
82
+ ---
83
+
84
+ ## Files Included
85
+
86
+ - **`correct_predictions.csv`**: Contains correctly classified reviews with their real and predicted labels.
87
+ - **`misclassified_predictions.csv`**: Contains misclassified reviews with their real and predicted labels, along with the difference.
88
+ - **`glove.6B.100d.txt`**: Pre-trained 100-dimensional GloVe embeddings used for initializing the embedding layer in the models.
89
+
90
+ ---
91
+
92
+ ## Limitations
93
+
94
+ 1. **Domain-Specific**: The model was trained on the TripAdvisor Sentiment Dataset, so it may not generalize to other types of reviews (e.g., Amazon, Yelp) or domains (e.g., tech product reviews) without further fine-tuning.
95
+ 2. **Subjectivity**: Sentiment annotations are subjective and may not fully represent every user's perception, especially for neutral or mixed reviews.
96
+ 3. **Performance**: The model's performance for mid-range sentiment labels (e.g., 2 and 3) is lower compared to extreme sentiment labels (1 and 5), as these tend to have more nuanced language.
97
+ 4. **Dependency on Pre-trained Embeddings**: The models rely on pre-trained GloVe embeddings, meaning their performance is closely tied to the quality and representativeness of these embeddings. Since GloVe embeddings were trained on a large, general corpus, they may not fully capture domain-specific nuances, such as specific phrasing or terms in hotel and restaurant reviews.
98
+ 5. **GRU Simplicity**: While the GRU model offers a simpler and more efficient architecture, it may not capture complex sequential patterns as effectively as more sophisticated models like BiLSTM or LSTM. This simplicity, however, contributes to its strong generalization performance on cross-domain data (e.g., Michelin dataset).