Model Card for Davephoenix/bert-bullying-detector
A BERT-based binary classifier that detects whether a given English text contains bullying content or not. It is fine-tuned for use in moderation tools, education platforms, and social media analysis.
Model Details
Model Description
This model is based on bert-base-uncased
and fine-tuned for binary text classification. The goal is to distinguish between bullying and non-bullying text, providing a tool to support online safety and moderation.
- Developed by: Davephoenix
- Funded by [optional]: Independent project
- Shared by [optional]: Davephoenix
- Model type: Text classification (binary)
- Language(s) (NLP): English
- License: Apache 2.0
- Finetuned from model [optional]: bert-base-uncased
Model Sources [optional]
- Repository: https://huggingface.co/Davephoenix/bert-bullying-detector
- Demo [optional]: API in progress
Uses
Direct Use
- Used for classifying short- to medium-length English text as "Bullying" or "Not Bullying".
- Can be integrated into moderation tools, educational apps, or awareness platforms.
Downstream Use [optional]
- As a building block in broader moderation or digital well-being systems.
- Further fine-tuning possible for specific platforms/domains.
Out-of-Scope Use
- Multilingual or non-English bullying detection.
- Misuse in legal or disciplinary decision-making without human oversight.
- Inference on sarcasm, coded language, or highly contextual text may be unreliable.
Bias, Risks, and Limitations
The model may exhibit limitations in:
- Cultural or contextual understanding of bullying.
- Identifying subtle or sarcastic forms of harassment.
- False positives in emotionally intense or confrontational but non-abusive language.
Recommendations
Users (both direct and downstream) should:
- Use the model alongside human review, especially in sensitive domains.
- Avoid deploying in high-stakes environments without thorough testing.
- Consider domain-specific fine-tuning if used outside general English online text.
How to Get Started with the Model
from transformers import AutoTokenizer, AutoModelForSequenceClassification
import torch
import torch.nn.functional as F
model_name = "Davephoenix/bert-bullying-detector"
tokenizer = AutoTokenizer.from_pretrained(model_name)
model = AutoModelForSequenceClassification.from_pretrained(model_name)
def classify_text(text):
inputs = tokenizer(text, return_tensors="pt", truncation=True, padding=True)
with torch.no_grad():
outputs = model(**inputs)
probs = F.softmax(outputs.logits, dim=1)
pred = torch.argmax(probs, dim=1).item()
return pred, probs[0][pred].item()
label_map = {0: "Not Bullying", 1: "Bullying"}
text = "You are so dumb and nobody likes you."
pred, confidence = classify_text(text)
print(f"Prediction: {label_map[pred]} (Confidence: {confidence:.2f})")
Training Details
Training Data
- Approximately 20,000 English text samples labeled as "bullying" or "not bullying"
- Balanced dataset curated from public moderation datasets and synthetic augmentation
Training Procedure
Preprocessing [optional]
- Tokenized using
bert-base-uncased
tokenizer - Truncation and padding to max_length of 128 tokens
Training Hyperparameters
- Training regime: fp16 mixed precision
- Epochs: 3
- Batch size: 32
- Optimizer: AdamW with linear warmup
- Learning rate: 2e-5
Speeds, Sizes, Times [optional]
- Training time: ~5 hours on Kaggle GPU
- Model size: ~420MB
- Final Checkpoint:
checkpoint-34371
Evaluation
Testing Data, Factors & Metrics
Testing Data
- 10% hold-out split from the training set
- Similar distribution to training data
Factors
- Sentence structure
- Presence of explicit abusive terms
- Subtlety of intent
Metrics
- Accuracy, F1 score, Loss
Results
- Accuracy: 95.6%
- F1 Score: 95.6%
- Validation Loss: 0.151
Summary
The model performs well for binary classification of bullying vs. non-bullying on general English text. Performance may degrade on ambiguous or culturally nuanced examples.
Model Examination [optional]
[More Information Needed]
Environmental Impact
Carbon emissions estimated via ML CO2 calculator:
- Hardware Type: NVIDIA P100
- Hours used: ~5
- Cloud Provider: Kaggle
- Compute Region: North America
- Carbon Emitted: < 2 kg COโ
Technical Specifications [optional]
Model Architecture and Objective
- Architecture: BERT base uncased (12-layer, 768-hidden, 12-heads, 110M parameters)
- Objective: Binary sequence classification with cross-entropy loss
Compute Infrastructure
Hardware
- Kaggle P100 GPU (free tier)
Software
transformers
4.39.3datasets
2.19.1- Python 3.11
- PyTorch 2.x
Citation [optional]
BibTeX:
@misc{bert-bullying-detector,
title={BERT Bullying Detector},
author={Davephoenix},
year={2025},
note={Fine-tuned BERT for binary text classification (bullying detection)},
howpublished={\url{https://huggingface.co/Davephoenix/bert-bullying-detector}}
}
APA:
Davephoenix. (2025). BERT Bullying Detector [Computer software]. Hugging Face. https://huggingface.co/Davephoenix/bert-bullying-detector
Glossary [optional]
- BERT: Bidirectional Encoder Representations from Transformers
- FP16: 16-bit floating point precision
- F1 Score: Harmonic mean of precision and recall
More Information [optional]
To request the training notebook or API wrapper, please contact the model author.
Model Card Authors [optional]
- Davephoenix
Model Card Contact
Let me know if you'd like this pushed directly to the Hub or edited from the UI.
- Downloads last month
- 17