Model Card for Davephoenix/bert-bullying-detector

A BERT-based binary classifier that detects whether a given English text contains bullying content or not. It is fine-tuned for use in moderation tools, education platforms, and social media analysis.

Model Details

Model Description

This model is based on bert-base-uncased and fine-tuned for binary text classification. The goal is to distinguish between bullying and non-bullying text, providing a tool to support online safety and moderation.

  • Developed by: Davephoenix
  • Funded by [optional]: Independent project
  • Shared by [optional]: Davephoenix
  • Model type: Text classification (binary)
  • Language(s) (NLP): English
  • License: Apache 2.0
  • Finetuned from model [optional]: bert-base-uncased

Model Sources [optional]

Uses

Direct Use

  • Used for classifying short- to medium-length English text as "Bullying" or "Not Bullying".
  • Can be integrated into moderation tools, educational apps, or awareness platforms.

Downstream Use [optional]

  • As a building block in broader moderation or digital well-being systems.
  • Further fine-tuning possible for specific platforms/domains.

Out-of-Scope Use

  • Multilingual or non-English bullying detection.
  • Misuse in legal or disciplinary decision-making without human oversight.
  • Inference on sarcasm, coded language, or highly contextual text may be unreliable.

Bias, Risks, and Limitations

The model may exhibit limitations in:

  • Cultural or contextual understanding of bullying.
  • Identifying subtle or sarcastic forms of harassment.
  • False positives in emotionally intense or confrontational but non-abusive language.

Recommendations

Users (both direct and downstream) should:

  • Use the model alongside human review, especially in sensitive domains.
  • Avoid deploying in high-stakes environments without thorough testing.
  • Consider domain-specific fine-tuning if used outside general English online text.

How to Get Started with the Model

from transformers import AutoTokenizer, AutoModelForSequenceClassification
import torch
import torch.nn.functional as F

model_name = "Davephoenix/bert-bullying-detector"
tokenizer = AutoTokenizer.from_pretrained(model_name)
model = AutoModelForSequenceClassification.from_pretrained(model_name)

def classify_text(text):
    inputs = tokenizer(text, return_tensors="pt", truncation=True, padding=True)
    with torch.no_grad():
        outputs = model(**inputs)
    probs = F.softmax(outputs.logits, dim=1)
    pred = torch.argmax(probs, dim=1).item()
    return pred, probs[0][pred].item()

label_map = {0: "Not Bullying", 1: "Bullying"}
text = "You are so dumb and nobody likes you."
pred, confidence = classify_text(text)
print(f"Prediction: {label_map[pred]} (Confidence: {confidence:.2f})")

Training Details

Training Data

  • Approximately 20,000 English text samples labeled as "bullying" or "not bullying"
  • Balanced dataset curated from public moderation datasets and synthetic augmentation

Training Procedure

Preprocessing [optional]

  • Tokenized using bert-base-uncased tokenizer
  • Truncation and padding to max_length of 128 tokens

Training Hyperparameters

  • Training regime: fp16 mixed precision
  • Epochs: 3
  • Batch size: 32
  • Optimizer: AdamW with linear warmup
  • Learning rate: 2e-5

Speeds, Sizes, Times [optional]

  • Training time: ~5 hours on Kaggle GPU
  • Model size: ~420MB
  • Final Checkpoint: checkpoint-34371

Evaluation

Testing Data, Factors & Metrics

Testing Data

  • 10% hold-out split from the training set
  • Similar distribution to training data

Factors

  • Sentence structure
  • Presence of explicit abusive terms
  • Subtlety of intent

Metrics

  • Accuracy, F1 score, Loss

Results

  • Accuracy: 95.6%
  • F1 Score: 95.6%
  • Validation Loss: 0.151

Summary

The model performs well for binary classification of bullying vs. non-bullying on general English text. Performance may degrade on ambiguous or culturally nuanced examples.

Model Examination [optional]

[More Information Needed]

Environmental Impact

Carbon emissions estimated via ML CO2 calculator:

  • Hardware Type: NVIDIA P100
  • Hours used: ~5
  • Cloud Provider: Kaggle
  • Compute Region: North America
  • Carbon Emitted: < 2 kg COโ‚‚

Technical Specifications [optional]

Model Architecture and Objective

  • Architecture: BERT base uncased (12-layer, 768-hidden, 12-heads, 110M parameters)
  • Objective: Binary sequence classification with cross-entropy loss

Compute Infrastructure

Hardware

  • Kaggle P100 GPU (free tier)

Software

  • transformers 4.39.3
  • datasets 2.19.1
  • Python 3.11
  • PyTorch 2.x

Citation [optional]

BibTeX:

@misc{bert-bullying-detector,
  title={BERT Bullying Detector},
  author={Davephoenix},
  year={2025},
  note={Fine-tuned BERT for binary text classification (bullying detection)},
  howpublished={\url{https://huggingface.co/Davephoenix/bert-bullying-detector}}
}

APA:

Davephoenix. (2025). BERT Bullying Detector [Computer software]. Hugging Face. https://huggingface.co/Davephoenix/bert-bullying-detector

Glossary [optional]

  • BERT: Bidirectional Encoder Representations from Transformers
  • FP16: 16-bit floating point precision
  • F1 Score: Harmonic mean of precision and recall

More Information [optional]

To request the training notebook or API wrapper, please contact the model author.

Model Card Authors [optional]

  • Davephoenix

Model Card Contact


Let me know if you'd like this pushed directly to the Hub or edited from the UI.
Downloads last month
17
Safetensors
Model size
109M params
Tensor type
F32
ยท
Inference Providers NEW
This model isn't deployed by any Inference Provider. ๐Ÿ™‹ 1 Ask for provider support

Space using Davephoenix/bert-bullying-detector 1