πŸ›‘οΈ LexiGuard: Misogyny, Misandry & Toxicity Detection in English and Slovak

LexiGuard is a multilingual multitask model designed to detect and classify offensive language, with a focus on misogyny, misandry, and toxicity levels in English. The model also supports Slovak, making it suitable for multilingual analysis of social media content.

It performs dual classification:

  1. Category: Misogyny, Misandry, or Neutral
  2. Toxicity level: Low, Medium, or High

The model is based on xlm-roberta-base and was fine-tuned on a custom dataset primarily in English, with additional annotated samples in Slovak.


🧠 Model Overview

  • Base model: xlm-roberta-base
  • Tasks: Multitask classification (2 output heads)
  • Primary language: English
  • Secondary language: Slovak
  • Use case: Detecting offensive, sexist, or toxic comments in multilingual social media

πŸ› οΈ Usage

from transformers import AutoTokenizer, AutoModelForSequenceClassification

tokenizer = AutoTokenizer.from_pretrained("Megyy/lexiguard")
model = AutoModelForSequenceClassification.from_pretrained("Megyy/lexiguard")

text = "Women are useless in politics."
inputs = tokenizer(text, return_tensors="pt")
outputs = model(**inputs)

# outputs.logits contains predictions for both tasks

Note: The model has two output heads:

  • Head 1: Category (misogyny/misandry/neutral)
  • Head 2: Toxicity (low/medium/high)

πŸ“Š Label Definitions

Task 1 – Category Classification

  • 0: Neutral
  • 1: Misogyny
  • 2: Misandry

Task 2 – Toxicity Prediction

  • 0: Low
  • 1: Medium
  • 2: High

πŸ§ͺ Training Data

  • Over 5,000 manually annotated comments
  • Domain: Online discussions, social media, and forums
  • Language distribution:
    • ~80% English
    • ~20% Slovak

πŸ“ Model Files

  • pytorch_model.bin / model.safetensors: model weights
  • config.json: model configuration
  • tokenizer.json, vocab.txt, etc.: tokenizer files
  • README.md: model card

πŸ“š Citation

If you use this model in your work, please cite:

@bachelorsthesis{majercakova2025lexiguard,
  title={LexiGuard: Offensive Language Detection in English and Slovak Social Media},
  author={Magdalena Majercakova},
  year={2025},
  note={Bachelor's thesis, TUKE},
}

πŸ‘¨β€πŸ’» Author

Developed by Magdaléna MajerčÑkovÑ as part of a Bachelor's Thesis
Supervised by Ing. Zuzana SokolovΓ‘, PhD
Faculty of Electrical Engineering and Informatics, TUKE (2025)


Downloads last month
6
Safetensors
Model size
278M params
Tensor type
F32
Β·
Inference Providers NEW
This model isn't deployed by any Inference Provider. πŸ™‹ Ask for provider support