Model Card for Davephoenix/bert-bullying-detector

A BERT-based binary classifier that detects whether a given English text contains bullying content or not. It is fine-tuned for use in moderation tools, education platforms, and social media analysis.

Model Details

Model Description

This model is based on bert-base-uncased and fine-tuned for binary text classification. The goal is to distinguish between bullying and non-bullying text, providing a tool to support online safety and moderation.

Developed by: Davephoenix
Funded by [optional]: Independent project
Shared by [optional]: Davephoenix
Model type: Text classification (binary)
Language(s) (NLP): English
License: Apache 2.0
Finetuned from model [optional]: bert-base-uncased

Model Sources [optional]

Repository: https://huggingface.co/Davephoenix/bert-bullying-detector
Demo [optional]: API in progress

Uses

Direct Use

Used for classifying short- to medium-length English text as "Bullying" or "Not Bullying".
Can be integrated into moderation tools, educational apps, or awareness platforms.

Downstream Use [optional]

As a building block in broader moderation or digital well-being systems.
Further fine-tuning possible for specific platforms/domains.

Out-of-Scope Use

Multilingual or non-English bullying detection.
Misuse in legal or disciplinary decision-making without human oversight.
Inference on sarcasm, coded language, or highly contextual text may be unreliable.

Bias, Risks, and Limitations

The model may exhibit limitations in:

Cultural or contextual understanding of bullying.
Identifying subtle or sarcastic forms of harassment.
False positives in emotionally intense or confrontational but non-abusive language.

Recommendations

Users (both direct and downstream) should:

Use the model alongside human review, especially in sensitive domains.
Avoid deploying in high-stakes environments without thorough testing.
Consider domain-specific fine-tuning if used outside general English online text.

How to Get Started with the Model

from transformers import AutoTokenizer, AutoModelForSequenceClassification
import torch
import torch.nn.functional as F

model_name = "Davephoenix/bert-bullying-detector"
tokenizer = AutoTokenizer.from_pretrained(model_name)
model = AutoModelForSequenceClassification.from_pretrained(model_name)

def classify_text(text):
    inputs = tokenizer(text, return_tensors="pt", truncation=True, padding=True)
    with torch.no_grad():
        outputs = model(**inputs)
    probs = F.softmax(outputs.logits, dim=1)
    pred = torch.argmax(probs, dim=1).item()
    return pred, probs[0][pred].item()

label_map = {0: "Not Bullying", 1: "Bullying"}
text = "You are so dumb and nobody likes you."
pred, confidence = classify_text(text)
print(f"Prediction: {label_map[pred]} (Confidence: {confidence:.2f})")

Training Details

Training Data

Approximately 20,000 English text samples labeled as "bullying" or "not bullying"
Balanced dataset curated from public moderation datasets and synthetic augmentation

Training Procedure

Preprocessing [optional]

Tokenized using bert-base-uncased tokenizer
Truncation and padding to max_length of 128 tokens

Training Hyperparameters

Training regime: fp16 mixed precision
Epochs: 3
Batch size: 32
Optimizer: AdamW with linear warmup
Learning rate: 2e-5

Speeds, Sizes, Times [optional]

Training time: ~5 hours on Kaggle GPU
Model size: ~420MB
Final Checkpoint: checkpoint-34371

Evaluation

Testing Data, Factors & Metrics

Testing Data

10% hold-out split from the training set
Similar distribution to training data

Factors

Sentence structure
Presence of explicit abusive terms
Subtlety of intent

Metrics

Accuracy, F1 score, Loss

Results

Accuracy: 95.6%
F1 Score: 95.6%
Validation Loss: 0.151

Summary

The model performs well for binary classification of bullying vs. non-bullying on general English text. Performance may degrade on ambiguous or culturally nuanced examples.

Model Examination [optional]

[More Information Needed]

Environmental Impact

Carbon emissions estimated via ML CO2 calculator:

Hardware Type: NVIDIA P100
Hours used: ~5
Cloud Provider: Kaggle
Compute Region: North America
Carbon Emitted: < 2 kg CO₂

Technical Specifications [optional]

Model Architecture and Objective

Architecture: BERT base uncased (12-layer, 768-hidden, 12-heads, 110M parameters)
Objective: Binary sequence classification with cross-entropy loss

Compute Infrastructure

Hardware

Kaggle P100 GPU (free tier)

Software

transformers 4.39.3
datasets 2.19.1
Python 3.11
PyTorch 2.x

Citation [optional]

BibTeX:

@misc{bert-bullying-detector,
  title={BERT Bullying Detector},
  author={Davephoenix},
  year={2025},
  note={Fine-tuned BERT for binary text classification (bullying detection)},
  howpublished={\url{https://huggingface.co/Davephoenix/bert-bullying-detector}}
}

APA:

Davephoenix. (2025). BERT Bullying Detector [Computer software]. Hugging Face. https://huggingface.co/Davephoenix/bert-bullying-detector

Glossary [optional]

BERT: Bidirectional Encoder Representations from Transformers
FP16: 16-bit floating point precision
F1 Score: Harmonic mean of precision and recall

More Information [optional]

To request the training notebook or API wrapper, please contact the model author.

Model Card Authors [optional]

Davephoenix

Model Card Contact

https://huggingface.co/Davephoenix


Let me know if you'd like this pushed directly to the Hub or edited from the UI.

Davephoenix
/

bert-bullying-detector