bad-good-text-classifier-ru-en
Description
This is an effective and simple neural network that can classify words as positive or negative in both Russian and English. It is suitable for filtering chats, comments, reviews and other texts to detect toxicity or negative content. However, the model is not ideal.
Features
- Bilingual model (Russian(focus is on russian), English).
- Fast and accurate classification
- Easy integration into Python projects
- Trained on a custom dataset with "good" and "bad" labels
Installation
Make sure you have Python 3.7+ and the Hugging Face transformers
package installed:
pip install transformers torch
Usage
Example of classifying a single text:
from transformers import AutoModelForSequenceClassification, AutoTokenizer
import torch
model_name = "akaruineko/bad-good-classifier-ru_en"
tokenizer = AutoTokenizer.from_pretrained(model_name)
model = AutoModelForSequenceClassification.from_pretrained(model_name)
def classify_word(word):
inputs = tokenizer(word, return_tensors="pt", truncation=True, padding=True)
outputs = model(**inputs)
probs = torch.softmax(outputs.logits, dim=1)
return {"good": probs[0][1].item(), "bad": probs[0][0].item()}
def classify_text_by_words(text):
words = text.split()
results = {}
for w in words:
results[w] = classify_word(w)
return results
if __name__ == "__main__":
sample_text = "Example text for classification"
results = classify_text_by_words(sample_text)
for word, scores in results.items():
print(f"Word: '{word}' - Good: {scores['good']:.4f}, Bad: {scores['bad']:.4f}")
LABEL_0 = bad, LABEL_1 = good
Training Data
The model is trained on two datasets labeled "good" and "bad". The data is manually prepared and includes texts in Russian and English.
Training Results
- Epochs: 12
- Minimum loss: ~0.03
- High accuracy on test dataset
License
MIT License.
Contact
Questions or suggestions? Write to: [email protected]
Thanks for using this classifier! Feel free to share feedback and improvement ideas.
- Downloads last month
- 159
Inference Providers
NEW
This model isn't deployed by any Inference Provider.
🙋
Ask for provider support
Model tree for akaruineko/bad-good-classifier-ru_en
Base model
cointegrated/rubert-tiny