Gender Prediction from Text ✍️ β†’ πŸ‘©β€πŸ¦°πŸ‘¨

This model predicts the likely gender of an anonymous speaker or writer based solely on the content of an English text. It is built upon DeBERTa-v3-large and fine-tuned on a diverse, multilingual, and multi-domain dataset with both formal and informal texts.

πŸ“ Space link: πŸ”— Try it out on Hugging Face Spaces
πŸ“ Model repo: πŸ”— View on Hugging Face Hub
🧠 Source code: GitHub


πŸ“Š Model Summary

  • Base model: microsoft/deberta-v3-large
  • Fine-tuned on: binary gender classification task (female vs male)
  • Best F1 Score: 0.69 on a balanced multi-domain test set
  • Max token length: 128
  • Evaluation Metrics:
    • F1: 0.69
    • Accuracy: 0.69
    • Precision: 0.69
    • Recall: 0.69

πŸ“‚ Evaluation: View on Notebook


🧾 Datasets Used

Dataset Domain Type
samzirbo/europarl.en-es.gendered Formal speech (Parliament) English
czyzi0/luna-speech-dataset Phone conversations Polish β†’ Translated
czyzi0/pwr-azon-speech-dataset Phone conversations Polish β†’ Translated
sagteam/author_profiling Social posts Russian β†’ Translated
kaushalgawri/nptel-en-tags-and-gender-v0 Spoken transcripts English
Blog Authorship Corpus Blog posts English

All datasets were normalized, translated if necessary, deduplicated, and balanced via random undersampling to ensure equal representation of both genders.


πŸ› οΈ Preprocessing & Training

  • Normalization: Cleaned quotes, dashes, placeholders, noise, and HTML/code from all datasets.
  • Translation: Used Helsinki-NLP/opus-mt-* models for Polish and Russian data.
  • Undersampling: Random undersampling to balance male and female samples.
  • Training Strategy:
    • LR Finder used to optimize learning rate (2.66e-6)
    • Fine-tuned using early stopping on both F1 and loss
    • Step-based evaluation every 250 steps
    • Best checkpoint at step 24,750 saved and evaluated
  • Second Phase Fine-tuning:
    • Performed on full merged dataset for 2 epochs
    • Used cosine learning rate scheduler and warm-up steps

πŸ“ˆ Performance (on full merged test set)

Class Precision Recall F1-Score Accuracy Support
Female 0.70 0.65 0.68 591,027
Male 0.68 0.72 0.70 591,027
Macro Avg 0.69 0.69 0.69 1,182,054
Accuracy 0.69 1,182,054

πŸ“¦ Usage Example

from transformers import AutoTokenizer, AutoModelForSequenceClassification
import torch
import torch.nn.functional as F

device = torch.device("cuda" if torch.cuda.is_available() else "cpu")

model_name = "fc63/gender_prediction_model_from_text"
tokenizer = AutoTokenizer.from_pretrained(model_name, use_fast=False)
model = AutoModelForSequenceClassification.from_pretrained(model_name).eval().to(device)

def predict(text):
    inputs = tokenizer(text, return_tensors="pt", truncation=True, padding=True, max_length=128).to(device)
    with torch.no_grad():
        outputs = model(**inputs)
        probs = F.softmax(outputs.logits, dim=1)
    pred = torch.argmax(probs, dim=1).item()
    confidence = round(probs[0][pred].item() * 100, 1)
    gender = "Female" if pred == 0 else "Male"
    return f"{gender} (Confidence: {confidence}%)"
sample_text = "I love writing in my journal every night. It helps me reflect on the day and plan for tomorrow."
print(predict(sample_text))

The Output Of This Sample:

Female (Confidence: 84.1%)

πŸ“Œ Future Work & Limitations

I do not want to leave this model at the level of 0.69 accuracy and F1 score.

As far as I can detect at this point, there is a bias towards predicting emotional, psychological, and introspective texts as female. Similarly, more direct and result-oriented writings are also often predicted as male. Therefore, a large, carefully labeled dataset that reflects the opposite of this pattern is needed.

The datasets used to train this model had to be obtained from open-source platforms, which limited the range of accessible data.

To make further progress, I need to create and label a larger dataset myself β€” which requires a significant amount of time, effort, and cost.

Before moving to dataset creation, I plan to try a few more approaches using the current dataset. So far, alternative techniques have not helped improve the scores without causing overfitting. After testing a few more methods, if none work, the only step left will be building a new dataset β€” and that will likely be the point where I stop development, as it will be both labor-intensive and costly for me.


πŸ‘¨β€πŸ”¬ Author & License

Author: Furkan Γ‡oban
Project: CENG-481 Gender Prediction Model
License: MIT

Downloads last month
157
Safetensors
Model size
435M params
Tensor type
F32
Β·
Inference Providers NEW
This model isn't deployed by any Inference Provider. πŸ™‹ Ask for provider support

Model tree for fc63/gender_prediction_model_from_text

Finetuned
(164)
this model

Datasets used to train fc63/gender_prediction_model_from_text

Spaces using fc63/gender_prediction_model_from_text 2