Model Card for Model ID
This model card provides details for the roberta-offensive-classifier, a binary text classification model fine-tuned to detect offensive and hateful language. Built on top of FacebookAI's RoBERTa-base architecture, it is intended for moderation of user-generated content. This model is built as part of the DTCCT project.
Model Details
Model Description
- Developed by: Kanishk Verma.
- Funded by: Google and Research Ireland under grant number EPSPG/2021/161
- Model type: Sequence Classification (Binary)
- Language(s) (NLP): EN
- License: cc-by-nc-3.0
- Finetuned from model : FacebookAI/roberta-base
Uses
Direct Use
The model can be used for classifying English text as offensive or non-offensive, supporting automated moderation in:
- Social media platforms
- Forums
- Online communities
Downstream Use
The model can be integrated into moderation pipelines or tools with additional features such as user feedback, flagging systems, or multi-language support.
Out-of-Scope Use
This model should not be used for:
- Legal decision-making
- Real-time moderation without human oversight
- Texts in languages other than English
Bias, Risks, and Limitations
The model is trained on datasets labeled for offensive and hateful language and may carry annotation biases. It may not generalize well to niche domains or novel forms of offensive speech.
Recommendations
- Human review should accompany model predictions in sensitive contexts.
- Evaluate on target data before deployment.
- Be cautious of over-filtering legitimate speech.
How to Get Started with the Model
from transformers import pipeline
classifier = pipeline("text-classification", model="adaptcentre/roberta-offensive-classifier")
classifier("Your input text here")
Training Details
Training Data
Trained on a composite dataset targeting:
- Offensive language
- Hate speech
- Toxicity
Testing Data, Factors & Metrics
[More Information Needed] -->
Metrics
- accuracy
- precision
- recall
- f1
Results
Metric Score Accuracy 0.8856 Precision 0.8334 Recall 0.7932 F1 Score 0.8128 Summary
The model demonstrates solid performance across all major classification metrics, suitable for content moderation tasks with English text.
--> BibTeX:
@misc{roberta-offensive-2025, title = {RoBERTa Base Offensive Language Classifier}, author = {Kanishk Verma}, }
- Downloads last month
- 20
Model tree for adaptcentre/roberta-offensive-classifier
Base model
FacebookAI/roberta-base