π‘οΈ LexiGuard: Misogyny, Misandry & Toxicity Detection in English and Slovak
LexiGuard is a multilingual multitask model designed to detect and classify offensive language, with a focus on misogyny, misandry, and toxicity levels in English. The model also supports Slovak, making it suitable for multilingual analysis of social media content.
It performs dual classification:
- Category: Misogyny, Misandry, or Neutral
- Toxicity level: Low, Medium, or High
The model is based on xlm-roberta-base and was fine-tuned on a custom dataset primarily in English, with additional annotated samples in Slovak.
π§ Model Overview
- Base model:
xlm-roberta-base
- Tasks: Multitask classification (2 output heads)
- Primary language: English
- Secondary language: Slovak
- Use case: Detecting offensive, sexist, or toxic comments in multilingual social media
π οΈ Usage
from transformers import AutoTokenizer, AutoModelForSequenceClassification
tokenizer = AutoTokenizer.from_pretrained("Megyy/lexiguard")
model = AutoModelForSequenceClassification.from_pretrained("Megyy/lexiguard")
text = "Women are useless in politics."
inputs = tokenizer(text, return_tensors="pt")
outputs = model(**inputs)
# outputs.logits contains predictions for both tasks
Note: The model has two output heads:
- Head 1: Category (misogyny/misandry/neutral)
- Head 2: Toxicity (low/medium/high)
π Label Definitions
Task 1 β Category Classification
0
: Neutral1
: Misogyny2
: Misandry
Task 2 β Toxicity Prediction
0
: Low1
: Medium2
: High
π§ͺ Training Data
- Over 5,000 manually annotated comments
- Domain: Online discussions, social media, and forums
- Language distribution:
- ~80% English
- ~20% Slovak
π Model Files
pytorch_model.bin
/model.safetensors
: model weightsconfig.json
: model configurationtokenizer.json
,vocab.txt
, etc.: tokenizer filesREADME.md
: model card
π Citation
If you use this model in your work, please cite:
@bachelorsthesis{majercakova2025lexiguard,
title={LexiGuard: Offensive Language Detection in English and Slovak Social Media},
author={Magdalena Majercakova},
year={2025},
note={Bachelor's thesis, TUKE},
}
π¨βπ» Author
Developed by MagdalΓ©na MajerΔΓ‘kovΓ‘ as part of a Bachelor's Thesis
Supervised by Ing. Zuzana SokolovΓ‘, PhD
Faculty of Electrical Engineering and Informatics, TUKE (2025)
- Downloads last month
- 6