README.md · seara/rubert-base-cased-russian-sentiment at main

rubert-base-cased-russian-sentiment / README.md

seara

Update README.md

6be8073 almost 2 years ago

preview code

raw

history blame contribute delete

2.24 kB

	---
	license: mit
	language:
	- ru
	metrics:
	- f1
	- roc_auc
	- precision
	- recall
	pipeline_tag: text-classification
	tags:
	- sentiment-analysis
	- multi-class-classification
	- sentiment analysis
	- rubert
	- sentiment
	- bert
	- russian
	- multiclass
	- classification
	datasets:
	- sismetanin/rureviews
	- RuSentiment
	- LinisCrowd2015
	- LinisCrowd2016
	- KaggleRussianNews
	---

	This is [RuBERT](https://huggingface.co/DeepPavlov/rubert-base-cased) model fine-tuned for __sentiment classification__ of short __Russian__ texts.
	The task is a __multi-class classification__ with the following labels:

	```yaml
	0: neutral
	1: positive
	2: negative
	```

	Label to Russian label:

	```yaml
	neutral: нейтральный
	positive: позитивный
	negative: негативный
	```

	## Usage

	```python
	from transformers import pipeline
	model = pipeline(model="seara/rubert-base-cased-russian-sentiment")
	model("Привет, ты мне нравишься!")
	# [{'label': 'positive', 'score': 0.9818321466445923}]
	```

	## Dataset

	This model was trained on the union of the following datasets:

	- Kaggle Russian News Dataset
	- Linis Crowd 2015
	- Linis Crowd 2016
	- RuReviews
	- RuSentiment

	An overview of the training data can be found on [S. Smetanin Github repository](https://github.com/sismetanin/sentiment-analysis-in-russian).

	__Download links for all Russian sentiment datasets collected by Smetanin can be found in this [repository](https://github.com/searayeah/russian-sentiment-emotion-datasets).__

	## Training

	Training were done in this [project](https://github.com/searayeah/bert-russian-sentiment-emotion) with this parameters:

	```yaml
	tokenizer.max_length: 256
	batch_size: 32
	optimizer: adam
	lr: 0.00001
	weight_decay: 0
	epochs: 2
	```

	Train/validation/test splits are 80%/10%/10%.

	## Eval results (on test split)

	\| \|neutral\|positive\|negative\|macro avg\|weighted avg\|
	\|---------\|-------\|--------\|--------\|---------\|------------\|
	\|precision\|0.72 \|0.85 \|0.75 \|0.77 \|0.77 \|
	\|recall \|0.75 \|0.84 \|0.72 \|0.77 \|0.77 \|
	\|f1-score \|0.73 \|0.84 \|0.73 \|0.77 \|0.77 \|
	\|auc-roc \|0.86 \|0.96 \|0.92 \|0.91 \|0.91 \|
	\|support \|5196 \|3831 \|3599 \|12626 \|12626 \|