|
--- |
|
license: mit |
|
language: |
|
- ru |
|
metrics: |
|
- f1 |
|
- roc_auc |
|
- precision |
|
- recall |
|
pipeline_tag: text-classification |
|
tags: |
|
- sentiment-analysis |
|
- multi-class-classification |
|
- sentiment analysis |
|
- rubert |
|
- sentiment |
|
- bert |
|
- russian |
|
- multiclass |
|
- classification |
|
datasets: |
|
- sismetanin/rureviews |
|
- RuSentiment |
|
- LinisCrowd2015 |
|
- LinisCrowd2016 |
|
- KaggleRussianNews |
|
--- |
|
|
|
This is [RuBERT](https://huggingface.co/DeepPavlov/rubert-base-cased) model fine-tuned for __sentiment classification__ of short __Russian__ texts. |
|
The task is a __multi-class classification__ with the following labels: |
|
|
|
```yaml |
|
0: neutral |
|
1: positive |
|
2: negative |
|
``` |
|
|
|
Label to Russian label: |
|
|
|
```yaml |
|
neutral: нейтральный |
|
positive: позитивный |
|
negative: негативный |
|
``` |
|
|
|
## Usage |
|
|
|
```python |
|
from transformers import pipeline |
|
model = pipeline(model="seara/rubert-base-cased-russian-sentiment") |
|
model("Привет, ты мне нравишься!") |
|
# [{'label': 'positive', 'score': 0.9818321466445923}] |
|
``` |
|
|
|
## Dataset |
|
|
|
This model was trained on the union of the following datasets: |
|
|
|
- Kaggle Russian News Dataset |
|
- Linis Crowd 2015 |
|
- Linis Crowd 2016 |
|
- RuReviews |
|
- RuSentiment |
|
|
|
An overview of the training data can be found on [S. Smetanin Github repository](https://github.com/sismetanin/sentiment-analysis-in-russian). |
|
|
|
__Download links for all Russian sentiment datasets collected by Smetanin can be found in this [repository](https://github.com/searayeah/russian-sentiment-emotion-datasets).__ |
|
|
|
## Training |
|
|
|
Training were done in this [project](https://github.com/searayeah/bert-russian-sentiment-emotion) with this parameters: |
|
|
|
```yaml |
|
tokenizer.max_length: 256 |
|
batch_size: 32 |
|
optimizer: adam |
|
lr: 0.00001 |
|
weight_decay: 0 |
|
epochs: 2 |
|
``` |
|
|
|
Train/validation/test splits are 80%/10%/10%. |
|
|
|
## Eval results (on test split) |
|
|
|
| |neutral|positive|negative|macro avg|weighted avg| |
|
|---------|-------|--------|--------|---------|------------| |
|
|precision|0.72 |0.85 |0.75 |0.77 |0.77 | |
|
|recall |0.75 |0.84 |0.72 |0.77 |0.77 | |
|
|f1-score |0.73 |0.84 |0.73 |0.77 |0.77 | |
|
|auc-roc |0.86 |0.96 |0.92 |0.91 |0.91 | |
|
|support |5196 |3831 |3599 |12626 |12626 | |
|
|