bad-good-text-classifier-ru-en

Description

This is an effective and simple neural network that can classify words as positive or negative in both Russian and English. It is suitable for filtering chats, comments, reviews and other texts to detect toxicity or negative content. However, the model is not ideal.

Features

  • Bilingual model (Russian(focus is on russian), English).
  • Fast and accurate classification
  • Easy integration into Python projects
  • Trained on a custom dataset with "good" and "bad" labels

Installation

Make sure you have Python 3.7+ and the Hugging Face transformers package installed:

pip install transformers torch

Usage

Example of classifying a single text:

from transformers import AutoModelForSequenceClassification, AutoTokenizer
import torch

model_name = "akaruineko/bad-good-classifier-ru_en"

tokenizer = AutoTokenizer.from_pretrained(model_name)
model = AutoModelForSequenceClassification.from_pretrained(model_name)

def classify_word(word):
    inputs = tokenizer(word, return_tensors="pt", truncation=True, padding=True)
    outputs = model(**inputs)
    probs = torch.softmax(outputs.logits, dim=1)
    return {"good": probs[0][1].item(), "bad": probs[0][0].item()}

def classify_text_by_words(text):
    words = text.split()
    results = {}
    for w in words:
        results[w] = classify_word(w)
    return results

if __name__ == "__main__":
    sample_text = "Example text for classification"
    results = classify_text_by_words(sample_text)
    for word, scores in results.items():
        print(f"Word: '{word}' - Good: {scores['good']:.4f}, Bad: {scores['bad']:.4f}")

LABEL_0 = bad, LABEL_1 = good

Training Data

The model is trained on two datasets labeled "good" and "bad". The data is manually prepared and includes texts in Russian and English.

Training Results

  • Epochs: 12
  • Minimum loss: ~0.03
  • High accuracy on test dataset

License

MIT License.

Contact

Questions or suggestions? Write to: [email protected]


Thanks for using this classifier! Feel free to share feedback and improvement ideas.

Downloads last month
159
Safetensors
Model size
11.8M params
Tensor type
F32
·
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for akaruineko/bad-good-classifier-ru_en

Finetuned
(8)
this model