Multilingual Hate Speech Detection - XLM-RoBERTa

This is a fine-tuned version of XLM-RoBERTa Base trained for multilingual hate speech detection in Spanish πŸ‡ͺπŸ‡Έ, English πŸ‡¬πŸ‡§, and French πŸ‡«πŸ‡·.
It is part of a master's thesis project focused on real-time detection of hate in videos and transcripts.

🧠 Intended Use

This model is designed to work with short- to medium-length text snippets extracted from video subtitles or transcripts.
It returns a binary classification (hate or not hate) with a probability score for further analysis.

πŸ“Š Training Data

This model was fine-tuned on a custom multilingual dataset composed of selected and preprocessed samples from multiple public corpora and custom-curated sets. The training set was carefully constructed to achieve language balance and mitigate demographic bias in hate speech detection.

Source Dataset Language(s) Description
manueltonneau/spanish-hate-speech-superset Spanish πŸ‡ͺπŸ‡Έ Aggregated Spanish hate speech datasets.
manueltonneau/english-hate-speech-superset English πŸ‡¬πŸ‡§ Extensive superset with over 300k samples from English corpora.
manueltonneau/french-hate-speech-superset French πŸ‡«πŸ‡· Curated superset from multiple French datasets.
HateCheck English (original) + Spanish + French 🌐 Translated into Spanish and French to test multilingual generalization and error cases.
Custom Bias Correction Dataset Multilingual 🌍 Designed to mitigate gender, racial, and cultural bias in predictions.

🧩 The final dataset consists of ~60,000 balanced samples, with comparable representation across Spanish, English, and French, ensuring no language dominates the training phase.

This balancing process involved sampling, filtering, and label unification from larger sources. The result is a compact, diverse, and inclusive dataset designed to generalize across cultures and languages while avoiding common pitfalls in hate speech modeling.

πŸ”Ž How to use

from transformers import AutoTokenizer, AutoModelForSequenceClassification
import torch.nn.functional as F
import torch

model = AutoModelForSequenceClassification.from_pretrained("WhiterBB/multilingual-hatespeech-detection")
tokenizer = AutoTokenizer.from_pretrained("WhiterBB/multilingual-hatespeech-detection")

text = "Je dΓ©teste cette personne"
inputs = tokenizer(text, return_tensors="pt")
with torch.no_grad():
    logits = model(**inputs).logits
    probs = F.softmax(logits, dim=-1)
    predicted_class = torch.argmax(probs).item()
    confidence = probs[0][predicted_class].item()

label = "Hate" if predicted_class == 1 else "Not Hate"
print(f"{label} ({confidence:.2%})")

πŸ§ͺ Metrics

The model was evaluated on a balanced multilingual dataset consisting of over 56,000 examples. Below are the performance metrics:

Class Precision Recall F1-score Support
Not Hate 0.85 0.83 0.84 30,352
Hate 0.81 0.83 0.82 26,609

Overall Accuracy: 0.83
Macro Average: Precision: 0.83, Recall: 0.83, F1-score: 0.83
Weighted Average: Precision: 0.83, Recall: 0.83, F1-score: 0.83

πŸ“„ License

MIT License – feel free to use for academic and non-commercial projects.

✍️ Author

Made with ❀️ by WhiterBB as part of a final master's thesis (TFM) in Artificial Intelligence.

Downloads last month
92
Safetensors
Model size
278M params
Tensor type
F32
Β·
Inference Providers NEW
This model isn't deployed by any Inference Provider. πŸ™‹ Ask for provider support