EmoBench-UA: Emotions Detection in Ukrainian Texts

Model Details
We provide the first of its kind emotions detector in Ukrainian texts. We cover six basic emotions: Joy, Anger, Fear, Disgust, Surprise, Sadness -- and None. Any text can contain any amount of emotion -- only one, several, or none at all. The texts with None emotions are the ones where the labels per emotions classes are 0.
The base model intfloat/multilingual-e5-large was fine-tuned for multi-label emotions classification task on the train part of ukr-emotions-binary dataset.
Usage: The model outputs 0 or 1 indicating the presence of the emotion in the text. Thus, this model can be used to detect any basic emotions presence or their absence in the Ukrainian texts.
Evaluation Results
General classification report of our model on the test part of ukr-emotions-binary:
precision | recall | f1-score | support | |
---|---|---|---|---|
Joy | 0.69 | 0.78 | 0.73 | 368 |
Fear | 0.80 | 0.81 | 0.81 | 151 |
Anger | 0.41 | 0.25 | 0.31 | 99 |
Sadness | 0.67 | 0.71 | 0.69 | 298 |
Disgust | 0.63 | 0.24 | 0.35 | 79 |
Surprise | 0.52 | 0.72 | 0.60 | 175 |
None | 0.82 | 0.80 | 0.81 | 1108 |
micro avg | 0.73 | 0.74 | 0.73 | 2278 |
macro avg | 0.65 | 0.62 | 0.62 | 2278 |
weighted avg | 0.73 | 0.74 | 0.73 | 2278 |
samples avg | 0.72 | 0.74 | 0.72 | 2278 |

How to Get Started with the Model
We provide the code to get started with the model:
import torch
from transformers import AutoTokenizer, AutoModelForSequenceClassification
# 1. Set up model & tokenizer (update model_name as needed)
model_name = "ukr-detect/emotions_classifier" # Or path to local dir
tokenizer = AutoTokenizer.from_pretrained(model_name)
model = AutoModelForSequenceClassification.from_pretrained(model_name)
model.eval() # For inference
# 2. Define thresholds (must match the order of model's id2label)
thresholds = {
"Joy": 0.35,
"Fear": 0.5,
"Anger": 0.25,
"Sadness": 0.5,
"Disgust": 0.3,
"Surprise": 0.25,
"None": 0.35
}
# 3. Prepare a function for prediction
def predict_emotions(texts):
# Tokenize
enc = tokenizer(texts, padding=True, truncation=True, return_tensors="pt")
with torch.no_grad():
outputs = model(**enc)
logits = outputs.logits
probs = torch.sigmoid(logits)
# Get label mapping in correct order
id2label = model.config.id2label
label_order = [id2label[i] for i in range(len(id2label))]
# Threshold and get predictions
predictions = []
for prob_row in probs:
single_pred = [
label for label, prob in zip(label_order, prob_row.tolist())
if prob > thresholds[label]
]
predictions.append(single_pred)
return predictions
# 4. Example usage
texts = [
"Я щойно отримав підвищення на роботі!",
"Я хвилююся за майбутнє.",
"я не буду заради тебе, Гані і Каті терпіти того пйоса",
"Сьогодні нічого не сталося.",
"всі чомусь плутають, а мені то дивно так )))",
" ого, то зовсім не приємно(("
]
results = predict_emotions(texts)
for t, r in zip(texts, results):
print(f"Text: {t}\nEmotions: {r}\n")
# Expected output
Text: Я щойно отримав підвищення на роботі!
Emotions: ['Joy', 'Surprise']
Text: Я хвилююся за майбутнє.
Emotions: ['Fear']
Text: я не буду заради тебе, Гані і Каті терпіти того пйоса
Emotions: ['Anger', 'Disgust']
Text: Сьогодні нічого не сталося.
Emotions: ['None']
Text: всі чомусь плутають, а мені то дивно так )))
Emotions: ['Surprise']
Text: ого, то зовсім не приємно((
Emotions: ['Sadness']
Citation
If you would like to acknowledge our work, please, cite the following manuscript:
TBD
Contacts
- Downloads last month
- 104
Model tree for ukr-detect/ukr-emotions-classifier
Base model
intfloat/multilingual-e5-large