Multilingual Hate Speech Detection - XLM-RoBERTa
This is a fine-tuned version of XLM-RoBERTa Base trained for multilingual hate speech detection in Spanish πͺπΈ, English π¬π§, and French π«π·.
It is part of a master's thesis project focused on real-time detection of hate in videos and transcripts.
π§ Intended Use
This model is designed to work with short- to medium-length text snippets extracted from video subtitles or transcripts.
It returns a binary classification (hate
or not hate
) with a probability score for further analysis.
π Training Data
This model was fine-tuned on a custom multilingual dataset composed of selected and preprocessed samples from multiple public corpora and custom-curated sets. The training set was carefully constructed to achieve language balance and mitigate demographic bias in hate speech detection.
Source Dataset | Language(s) | Description |
---|---|---|
manueltonneau/spanish-hate-speech-superset |
Spanish πͺπΈ | Aggregated Spanish hate speech datasets. |
manueltonneau/english-hate-speech-superset |
English π¬π§ | Extensive superset with over 300k samples from English corpora. |
manueltonneau/french-hate-speech-superset |
French π«π· | Curated superset from multiple French datasets. |
HateCheck |
English (original) + Spanish + French π | Translated into Spanish and French to test multilingual generalization and error cases. |
Custom Bias Correction Dataset |
Multilingual π | Designed to mitigate gender, racial, and cultural bias in predictions. |
π§© The final dataset consists of ~60,000 balanced samples, with comparable representation across Spanish, English, and French, ensuring no language dominates the training phase.
This balancing process involved sampling, filtering, and label unification from larger sources. The result is a compact, diverse, and inclusive dataset designed to generalize across cultures and languages while avoiding common pitfalls in hate speech modeling.
π How to use
from transformers import AutoTokenizer, AutoModelForSequenceClassification
import torch.nn.functional as F
import torch
model = AutoModelForSequenceClassification.from_pretrained("WhiterBB/multilingual-hatespeech-detection")
tokenizer = AutoTokenizer.from_pretrained("WhiterBB/multilingual-hatespeech-detection")
text = "Je dΓ©teste cette personne"
inputs = tokenizer(text, return_tensors="pt")
with torch.no_grad():
logits = model(**inputs).logits
probs = F.softmax(logits, dim=-1)
predicted_class = torch.argmax(probs).item()
confidence = probs[0][predicted_class].item()
label = "Hate" if predicted_class == 1 else "Not Hate"
print(f"{label} ({confidence:.2%})")
π§ͺ Metrics
The model was evaluated on a balanced multilingual dataset consisting of over 56,000 examples. Below are the performance metrics:
Class | Precision | Recall | F1-score | Support |
---|---|---|---|---|
Not Hate | 0.85 | 0.83 | 0.84 | 30,352 |
Hate | 0.81 | 0.83 | 0.82 | 26,609 |
Overall Accuracy: 0.83
Macro Average: Precision: 0.83, Recall: 0.83, F1-score: 0.83
Weighted Average: Precision: 0.83, Recall: 0.83, F1-score: 0.83
π License
MIT License β feel free to use for academic and non-commercial projects.
βοΈ Author
Made with β€οΈ by WhiterBB as part of a final master's thesis (TFM) in Artificial Intelligence.
- Downloads last month
- 92