NLP ZG
Collection
Collection of all datasets, models, and demos, created during the NLP course at University of Zagreb.
•
14 items
•
Updated
•
2
This repository contains a fine-tuned DistilBERT model trained for sentiment analysis on TripAdvisor reviews. The model predicts sentiment scores on a scale of 1 to 5 based on review text.
distilbert-base-uncased
3e-05
64
10
(with early stopping)5
(epochs without improvement)distilbert-base-uncased
This model is designed to classify hotel reviews based on their sentiment. It assigns a star rating between 1 and 5 to a review, indicating the sentiment expressed in the review.
The dataset used for training, validation, and testing is nhull/tripadvisor-split-dataset-v2. It consists of:
All splits are balanced across five sentiment labels.
Model predicts too high on average by 0.3934
.
Metric | Value |
---|---|
Accuracy | 0.6391 |
Precision | 0.6416 |
Recall | 0.6391 |
F1-Score | 0.6400 |
Label | Precision | Recall | F1-Score | Support |
---|---|---|---|---|
1 | 0.7483 | 0.6856 | 0.7156 | 1600 |
2 | 0.5445 | 0.5544 | 0.5494 | 1600 |
3 | 0.6000 | 0.6281 | 0.6137 | 1600 |
4 | 0.5828 | 0.5894 | 0.5861 | 1600 |
5 | 0.7326 | 0.7381 | 0.7354 | 1600 |
True \ Predicted | 1 | 2 | 3 | 4 | 5 |
---|---|---|---|---|---|
1 | 1097 | 437 | 60 | 3 | 3 |
2 | 327 | 887 | 344 | 34 | 8 |
3 | 37 | 278 | 1005 | 254 | 26 |
4 | 3 | 21 | 239 | 943 | 394 |
5 | 2 | 6 | 27 | 384 | 1181 |
validation_results_distilbert.csv
: Contains correctly classified reviews with their real and predicted labels.Base model
distilbert/distilbert-base-uncased