README extended
Browse files
README.md
CHANGED
@@ -10,8 +10,8 @@ tags:
|
|
10 |
## Abstract
|
11 |
|
12 |
Sentiment analysis is a common task to understand people's reactions online. Still, we often need more nuanced information: is the post negative because the user is angry or because they are sad?
|
13 |
-
An abundance of approaches has been introduced for tackling both tasks. However, at least for Italian, they all treat only one of the tasks at a time. We introduce FEEL-IT
|
14 |
-
We release an open-source Python library, so researchers can use a model trained on FEEL-IT for inferring both sentiments and emotions from Italian text.
|
15 |
|
16 |
| Model | Download |
|
17 |
| ------ | -------------------------|
|
@@ -21,15 +21,18 @@ We release an open-source Python library, so researchers can use a model trained
|
|
21 |
|
22 |
## Model
|
23 |
|
24 |
-
The feel-it-italian-sentiment model performs sentiment analysis. We fine-tuned the [UmBERTo model](https://huggingface.co/Musixmatch/umberto-commoncrawl-cased-v1) on our new dataset (i.e., FEEL-IT) obtaining state-of-the-art performances on different
|
25 |
|
26 |
## Data
|
27 |
|
28 |
-
Our data has been collected by annotating tweets from a broad range of topics. In total, we have 2037 tweets annotated with an emotion label. More details can be found in our paper.
|
29 |
|
30 |
## Performance
|
31 |
|
32 |
-
We evaluate our performance using [SENTIPOLC16 Evalita](http://www.di.unito.it/~tutreeb/sentipolc-evalita16/
|
|
|
|
|
|
|
33 |
|
34 |
We use the fine-tuned UmBERTo model. The results show that FEEL-IT can provide better results on the SENTIPOLC test set than those that can be obtained with the SENTIPOLC training set.
|
35 |
|
@@ -42,10 +45,29 @@ We use the fine-tuned UmBERTo model. The results show that FEEL-IT can provide b
|
|
42 |
## Usage
|
43 |
|
44 |
```python
|
|
|
|
|
45 |
from transformers import AutoTokenizer, AutoModelForSequenceClassification
|
46 |
|
|
|
47 |
tokenizer = AutoTokenizer.from_pretrained("MilaNLProc/feel-it-italian-sentiment")
|
48 |
model = AutoModelForSequenceClassification.from_pretrained("MilaNLProc/feel-it-italian-sentiment")
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
49 |
```
|
50 |
|
51 |
## Citation
|
|
|
10 |
## Abstract
|
11 |
|
12 |
Sentiment analysis is a common task to understand people's reactions online. Still, we often need more nuanced information: is the post negative because the user is angry or because they are sad?
|
13 |
+
An abundance of approaches has been introduced for tackling both tasks. However, at least for Italian, they all treat only one of the tasks at a time. We introduce *FEEL-IT*, a novel benchmark corpus of Italian Twitter posts annotated with four basic emotions: **anger, fear, joy, sadness**. By collapsing them, we can also do **sentiment analysis**. We evaluate our corpus on benchmark datasets for both emotion and sentiment classification, obtaining competitive results.
|
14 |
+
We release an [open-source Python library](https://github.com/MilaNLProc/feel-it), so researchers can use a model trained on FEEL-IT for inferring both sentiments and emotions from Italian text.
|
15 |
|
16 |
| Model | Download |
|
17 |
| ------ | -------------------------|
|
|
|
21 |
|
22 |
## Model
|
23 |
|
24 |
+
The *feel-it-italian-sentiment* model performs **sentiment analysis** on Italian. We fine-tuned the [UmBERTo model](https://huggingface.co/Musixmatch/umberto-commoncrawl-cased-v1) on our new dataset (i.e., FEEL-IT) obtaining state-of-the-art performances on different benchmark corpus.
|
25 |
|
26 |
## Data
|
27 |
|
28 |
+
Our data has been collected by annotating tweets from a broad range of topics. In total, we have 2037 tweets annotated with an emotion label. More details can be found in our paper (preprint available soon).
|
29 |
|
30 |
## Performance
|
31 |
|
32 |
+
We evaluate our performance using [SENTIPOLC16 Evalita](http://www.di.unito.it/~tutreeb/sentipolc-evalita16/). We collapsed the FEEL-IT classes into 2 by mapping joy to the *positive* class and anger, fear and sadness into the *negative* class. We compare three different training dataset combinations to understand whether it is better to train on FEEL-IT, SP16, or both by testing on the SP16 test set.
|
33 |
+
|
34 |
+
|
35 |
+
This dataset comes with a training set and a testing set and thus we can compare the performance of different training datasets on the SENTIPOLC test set.
|
36 |
|
37 |
We use the fine-tuned UmBERTo model. The results show that FEEL-IT can provide better results on the SENTIPOLC test set than those that can be obtained with the SENTIPOLC training set.
|
38 |
|
|
|
45 |
## Usage
|
46 |
|
47 |
```python
|
48 |
+
import torch
|
49 |
+
import numpy as np
|
50 |
from transformers import AutoTokenizer, AutoModelForSequenceClassification
|
51 |
|
52 |
+
# Load model and tokenizer
|
53 |
tokenizer = AutoTokenizer.from_pretrained("MilaNLProc/feel-it-italian-sentiment")
|
54 |
model = AutoModelForSequenceClassification.from_pretrained("MilaNLProc/feel-it-italian-sentiment")
|
55 |
+
|
56 |
+
sentence = 'Oggi sono proprio contento!'
|
57 |
+
inputs = tokenizer(sentence, return_tensors="pt")
|
58 |
+
|
59 |
+
# Call the model and get the logits
|
60 |
+
labels = torch.tensor([1]).unsqueeze(0) # Batch size 1
|
61 |
+
outputs = model(**inputs, labels=labels)
|
62 |
+
loss, logits = outputs[:2]
|
63 |
+
logits = logits.squeeze(0)
|
64 |
+
|
65 |
+
# Extract probabilities
|
66 |
+
proba = torch.nn.functional.softmax(logits, dim=0)
|
67 |
+
|
68 |
+
# Unpack the tensor to obtain negative and positive probabilities
|
69 |
+
negative, positive = proba
|
70 |
+
print(f"Probabilities: Negative {np.round(negative.item(),4)} - Positive {np.round(positive.item(),4)}")
|
71 |
```
|
72 |
|
73 |
## Citation
|