Model Card for Model ID

This model card provides details for the roberta-offensive-classifier, a binary text classification model fine-tuned to detect offensive and hateful language. Built on top of FacebookAI's RoBERTa-base architecture, it is intended for moderation of user-generated content. This model is built as part of the DTCCT project.

Model Details

Model Description

Developed by: Kanishk Verma.
Funded by: Google and Research Ireland under grant number EPSPG/2021/161
Model type: Sequence Classification (Binary)
Language(s) (NLP): EN
License: cc-by-nc-3.0
Finetuned from model : FacebookAI/roberta-base

Uses

Direct Use

The model can be used for classifying English text as offensive or non-offensive, supporting automated moderation in:

Social media platforms
Forums
Online communities

Downstream Use

The model can be integrated into moderation pipelines or tools with additional features such as user feedback, flagging systems, or multi-language support.

Out-of-Scope Use

This model should not be used for:

Legal decision-making
Real-time moderation without human oversight
Texts in languages other than English

Bias, Risks, and Limitations

The model is trained on datasets labeled for offensive and hateful language and may carry annotation biases. It may not generalize well to niche domains or novel forms of offensive speech.

Recommendations

Human review should accompany model predictions in sensitive contexts.
Evaluate on target data before deployment.
Be cautious of over-filtering legitimate speech.

How to Get Started with the Model

from transformers import pipeline

classifier = pipeline("text-classification", model="adaptcentre/roberta-offensive-classifier")
classifier("Your input text here")

Training Details

Training Data

Trained on a composite dataset targeting:

Offensive language
Hate speech
Toxicity
Testing Data, Factors & Metrics

[More Information Needed] -->

Metrics
- accuracy
- precision
- recall
- f1
Results

Metric Score

Accuracy 0.8856

Precision 0.8334

Recall 0.7932

F1 Score 0.8128

Summary

The model demonstrates solid performance across all major classification metrics, suitable for content moderation tasks with English text.

--> BibTeX:

@misc{roberta-offensive-2025, title = {RoBERTa Base Offensive Language Classifier}, author = {Kanishk Verma}, }

adaptcentre
/

roberta-offensive-classifier