bad-good-text-classifier-ru-en

Description

This is an effective and simple neural network that can classify words as positive or negative in both Russian and English. It is suitable for filtering chats, comments, reviews and other texts to detect toxicity or negative content. However, the model is not ideal.

Features

Bilingual model (Russian(focus is on russian), English).
Fast and accurate classification
Easy integration into Python projects
Trained on a custom dataset with "good" and "bad" labels

Installation

Make sure you have Python 3.7+ and the Hugging Face transformers package installed:

pip install transformers torch

Usage

Example of classifying a single text:

from transformers import AutoModelForSequenceClassification, AutoTokenizer
import torch

model_name = "akaruineko/bad-good-classifier-ru_en"

tokenizer = AutoTokenizer.from_pretrained(model_name)
model = AutoModelForSequenceClassification.from_pretrained(model_name)

def classify_word(word):
    inputs = tokenizer(word, return_tensors="pt", truncation=True, padding=True)
    outputs = model(**inputs)
    probs = torch.softmax(outputs.logits, dim=1)
    return {"good": probs[0][1].item(), "bad": probs[0][0].item()}

def classify_text_by_words(text):
    words = text.split()
    results = {}
    for w in words:
        results[w] = classify_word(w)
    return results

if __name__ == "__main__":
    sample_text = "Example text for classification"
    results = classify_text_by_words(sample_text)
    for word, scores in results.items():
        print(f"Word: '{word}' - Good: {scores['good']:.4f}, Bad: {scores['bad']:.4f}")

LABEL_0 = bad, LABEL_1 = good

Training Data

The model is trained on two datasets labeled "good" and "bad". The data is manually prepared and includes texts in Russian and English.

Training Results

Epochs: 12
Minimum loss: ~0.03
High accuracy on test dataset

License

MIT License.

Contact

Questions or suggestions? Write to: [email protected]

Thanks for using this classifier! Feel free to share feedback and improvement ideas.

akaruineko
/

bad-good-classifier-ru_en