Model Card for Model ID

This model card provides details for the roberta-offensive-classifier, a binary text classification model fine-tuned to detect offensive and hateful language. Built on top of FacebookAI's RoBERTa-base architecture, it is intended for moderation of user-generated content. This model is built as part of the DTCCT project.

Model Details

Model Description

  • Developed by: Kanishk Verma.
  • Funded by: Google and Research Ireland under grant number EPSPG/2021/161
  • Model type: Sequence Classification (Binary)
  • Language(s) (NLP): EN
  • License: cc-by-nc-3.0
  • Finetuned from model : FacebookAI/roberta-base

Uses

Direct Use

The model can be used for classifying English text as offensive or non-offensive, supporting automated moderation in:

  • Social media platforms
  • Forums
  • Online communities

Downstream Use

The model can be integrated into moderation pipelines or tools with additional features such as user feedback, flagging systems, or multi-language support.

Out-of-Scope Use

This model should not be used for:

  • Legal decision-making
  • Real-time moderation without human oversight
  • Texts in languages other than English

Bias, Risks, and Limitations

The model is trained on datasets labeled for offensive and hateful language and may carry annotation biases. It may not generalize well to niche domains or novel forms of offensive speech.

Recommendations

  • Human review should accompany model predictions in sensitive contexts.
  • Evaluate on target data before deployment.
  • Be cautious of over-filtering legitimate speech.

How to Get Started with the Model

from transformers import pipeline

classifier = pipeline("text-classification", model="adaptcentre/roberta-offensive-classifier")
classifier("Your input text here")

Training Details

Training Data

Trained on a composite dataset targeting:

  • Offensive language
  • Hate speech
  • Toxicity

    Testing Data, Factors & Metrics

    [More Information Needed] -->

    Metrics

    • accuracy
    • precision
    • recall
    • f1

    Results

    Metric Score
    Accuracy 0.8856
    Precision 0.8334
    Recall 0.7932
    F1 Score 0.8128

    Summary

    The model demonstrates solid performance across all major classification metrics, suitable for content moderation tasks with English text.

    --> BibTeX:

    @misc{roberta-offensive-2025, title = {RoBERTa Base Offensive Language Classifier}, author = {Kanishk Verma}, }

Downloads last month
20
Safetensors
Model size
125M params
Tensor type
F32
ยท
Inference Providers NEW
This model isn't deployed by any Inference Provider. ๐Ÿ™‹ Ask for provider support

Model tree for adaptcentre/roberta-offensive-classifier

Finetuned
(1632)
this model