|
--- |
|
library_name: transformers |
|
license: mit |
|
datasets: |
|
- seara/ru_go_emotions |
|
base_model: ai-forever/ruRoberta-large |
|
language: |
|
- ru |
|
tags: |
|
- Text Classification |
|
- emotion-classification |
|
- emotion-recognition |
|
- emotion-detection |
|
- emotion |
|
- multilabel |
|
metrics: |
|
- f1 |
|
- precision |
|
- recall |
|
--- |
|
|
|
|
|
This is [ruRoberta-large](https://huggingface.co/ai-forever/ruRoberta-large) model finetuned on [ru_go_emotions](https://huggingface.co/datasets/seara/ru_go_emotions) |
|
dataset for multilabel classification. Model can be used to extract all emotions from text or detect certain emotions. Thresholds are selected on validation set by maximizing f1 macro over all labels. |
|
|
|
# Usage |
|
Using model with Huggingface Transformers: |
|
```python |
|
import torch |
|
from transformers import AutoTokenizer, AutoModelForSequenceClassification |
|
tokenizer = AutoTokenizer.from_pretrained("fyaronskiy/ruRoberta-large-ru-go-emotions") |
|
model = AutoModelForSequenceClassification.from_pretrained("fyaronskiy/ruRoberta-large-ru-go-emotions") |
|
|
|
best_thresholds = [0.36734693877551017, 0.2857142857142857, 0.2857142857142857, 0.16326530612244897, 0.14285714285714285, 0.14285714285714285, 0.18367346938775508, 0.3469387755102041, 0.32653061224489793, 0.22448979591836732, 0.2040816326530612, 0.2857142857142857, 0.18367346938775508, 0.2857142857142857, 0.24489795918367346, 0.7142857142857142, 0.02040816326530612, 0.3061224489795918, 0.44897959183673464, 0.061224489795918366, 0.18367346938775508, 0.04081632653061224, 0.08163265306122448, 0.1020408163265306, 0.22448979591836732, 0.3877551020408163, 0.3469387755102041, 0.24489795918367346] |
|
LABELS = ['admiration', 'amusement', 'anger', 'annoyance', 'approval', 'caring', 'confusion', 'curiosity', 'desire', 'disappointment', 'disapproval', 'disgust', 'embarrassment', 'excitement', 'fear', 'gratitude', 'grief', 'joy', 'love', 'nervousness', 'optimism', 'pride', 'realization', 'relief', 'remorse', 'sadness', 'surprise', 'neutral'] |
|
ID2LABEL = dict(enumerate(LABELS)) |
|
``` |
|
|
|
Here is how you can extract emotions contained in text: |
|
|
|
```python |
|
def predict_emotions(text): |
|
inputs = tokenizer(text, truncation=True, add_special_tokens=True, max_length=128, return_tensors='pt') |
|
with torch.no_grad(): |
|
logits = model(**inputs).logits |
|
probas = torch.sigmoid(logits).squeeze(dim=0) |
|
class_binary_labels = (probas > torch.tensor(best_thresholds)).int() |
|
return [ID2LABEL[label_id] for label_id, value in enumerate(class_binary_labels) if value == 1] |
|
|
|
print(predict_emotions('У вас отличный сервис и лучший кофе в городе, обожаю вашу кофейню!')) |
|
|
|
#['admiration', 'love'] |
|
``` |
|
|
|
This is the way to get all emotions and their scores: |
|
|
|
```python |
|
def predict(text): |
|
inputs = tokenizer(text, truncation=True, add_special_tokens=True, max_length=128, return_tensors='pt') |
|
with torch.no_grad(): |
|
logits = model(**inputs).logits |
|
probas = torch.sigmoid(logits).squeeze(dim=0).tolist() |
|
probas = [round(proba, 3) for proba in probas] |
|
|
|
labels2probas = dict(zip(LABELS, probas)) |
|
probas_dict_sorted = dict(sorted(labels2probas.items(), key=lambda x: x[1], reverse=True)) |
|
return probas_dict_sorted |
|
|
|
print(predict('У вас отличный сервис и лучший кофе в городе, обожаю вашу кофейню!')) |
|
'''{'admiration': 0.81, |
|
'love': 0.538, |
|
'joy': 0.041, |
|
'gratitude': 0.031, |
|
'approval': 0.026, |
|
'excitement': 0.023, |
|
'neutral': 0.009, |
|
'curiosity': 0.006, |
|
'amusement': 0.005, |
|
'desire': 0.005, |
|
'realization': 0.005, |
|
'caring': 0.004, |
|
'confusion': 0.004, |
|
'surprise': 0.004, |
|
'disappointment': 0.003, |
|
'disapproval': 0.003, |
|
'anger': 0.002, |
|
'annoyance': 0.002, |
|
'disgust': 0.002, |
|
'fear': 0.002, |
|
'grief': 0.002, |
|
'optimism': 0.002, |
|
'pride': 0.002, |
|
'relief': 0.002, |
|
'sadness': 0.002, |
|
'embarrassment': 0.001, |
|
'nervousness': 0.001, |
|
'remorse': 0.001} |
|
''' |
|
``` |
|
|
|
# Eval results on test split of ru-go-emotions |
|
|
|
|
|
| |precision|recall|f1-score|support|threshold| |
|
|--------------|---------|------|--------|-------|---------| |
|
|admiration |0.63 |0.75 |0.69 |504 |0.37 | |
|
|amusement |0.76 |0.91 |0.83 |264 |0.29 | |
|
|anger |0.47 |0.32 |0.38 |198 |0.29 | |
|
|annoyance |0.33 |0.39 |0.36 |320 |0.16 | |
|
|approval |0.27 |0.58 |0.37 |351 |0.14 | |
|
|caring |0.32 |0.59 |0.41 |135 |0.14 | |
|
|confusion |0.41 |0.52 |0.46 |153 |0.18 | |
|
|curiosity |0.45 |0.73 |0.55 |284 |0.35 | |
|
|desire |0.54 |0.31 |0.40 |83 |0.33 | |
|
|disappointment|0.31 |0.34 |0.33 |151 |0.22 | |
|
|disapproval |0.31 |0.57 |0.40 |267 |0.20 | |
|
|disgust |0.44 |0.40 |0.42 |123 |0.29 | |
|
|embarrassment |0.48 |0.38 |0.42 |37 |0.18 | |
|
|excitement |0.29 |0.43 |0.34 |103 |0.29 | |
|
|fear |0.56 |0.78 |0.65 |78 |0.24 | |
|
|gratitude |0.95 |0.85 |0.89 |352 |0.71 | |
|
|grief |0.03 |0.33 |0.05 |6 |0.02 | |
|
|joy |0.48 |0.58 |0.53 |161 |0.31 | |
|
|love |0.73 |0.84 |0.78 |238 |0.45 | |
|
|nervousness |0.24 |0.48 |0.32 |23 |0.06 | |
|
|optimism |0.57 |0.54 |0.56 |186 |0.18 | |
|
|pride |0.67 |0.38 |0.48 |16 |0.04 | |
|
|realization |0.18 |0.31 |0.23 |145 |0.08 | |
|
|relief |0.30 |0.27 |0.29 |11 |0.10 | |
|
|remorse |0.53 |0.84 |0.65 |56 |0.22 | |
|
|sadness |0.56 |0.53 |0.55 |156 |0.39 | |
|
|surprise |0.55 |0.57 |0.56 |141 |0.35 | |
|
|neutral |0.59 |0.79 |0.68 |1787 |0.24 | |
|
|micro avg |0.50 |0.66 |0.57 |6329 | | |
|
|macro avg |0.46 |0.55 |0.48 |6329 | | |
|
|weighted avg |0.53 |0.66 |0.58 |6329 | | |