Multilingual Hate Speech Detection - XLM-RoBERTa

This is a fine-tuned version of XLM-RoBERTa Base trained for multilingual hate speech detection in Spanish 🇪🇸, English 🇬🇧, and French 🇫🇷.
It is part of a master's thesis project focused on real-time detection of hate in videos and transcripts.

🧠 Intended Use

This model is designed to work with short- to medium-length text snippets extracted from video subtitles or transcripts.
It returns a binary classification (hate or not hate) with a probability score for further analysis.

📊 Training Data

This model was fine-tuned on a custom multilingual dataset composed of selected and preprocessed samples from multiple public corpora and custom-curated sets. The training set was carefully constructed to achieve language balance and mitigate demographic bias in hate speech detection.

Source Dataset	Language(s)	Description
`manueltonneau/spanish-hate-speech-superset`	Spanish 🇪🇸	Aggregated Spanish hate speech datasets.
`manueltonneau/english-hate-speech-superset`	English 🇬🇧	Extensive superset with over 300k samples from English corpora.
`manueltonneau/french-hate-speech-superset`	French 🇫🇷	Curated superset from multiple French datasets.
`HateCheck`	English (original) + Spanish + French 🌐	Translated into Spanish and French to test multilingual generalization and error cases.
`Custom Bias Correction Dataset`	Multilingual 🌍	Designed to mitigate gender, racial, and cultural bias in predictions.

🧩 The final dataset consists of ~60,000 balanced samples, with comparable representation across Spanish, English, and French, ensuring no language dominates the training phase.

This balancing process involved sampling, filtering, and label unification from larger sources. The result is a compact, diverse, and inclusive dataset designed to generalize across cultures and languages while avoiding common pitfalls in hate speech modeling.

🔎 How to use

from transformers import AutoTokenizer, AutoModelForSequenceClassification
import torch.nn.functional as F
import torch

model = AutoModelForSequenceClassification.from_pretrained("WhiterBB/multilingual-hatespeech-detection")
tokenizer = AutoTokenizer.from_pretrained("WhiterBB/multilingual-hatespeech-detection")

text = "Je déteste cette personne"
inputs = tokenizer(text, return_tensors="pt")
with torch.no_grad():
    logits = model(**inputs).logits
    probs = F.softmax(logits, dim=-1)
    predicted_class = torch.argmax(probs).item()
    confidence = probs[0][predicted_class].item()

label = "Hate" if predicted_class == 1 else "Not Hate"
print(f"{label} ({confidence:.2%})")

🧪 Metrics

The model was evaluated on a balanced multilingual dataset consisting of over 56,000 examples. Below are the performance metrics:

Class	Precision	Recall	F1-score	Support
Not Hate	0.85	0.83	0.84	30,352
Hate	0.81	0.83	0.82	26,609

Overall Accuracy: 0.83
Macro Average: Precision: 0.83, Recall: 0.83, F1-score: 0.83
Weighted Average: Precision: 0.83, Recall: 0.83, F1-score: 0.83

📄 License

MIT License – feel free to use for academic and non-commercial projects.

✍️ Author

Made with ❤️ by WhiterBB as part of a final master's thesis (TFM) in Artificial Intelligence.

WhiterBB
/

multilingual-hatespeech-detection