language: en
tags:
- text-classification
- gender
- gender-prediction
- transformers
- deberta
license: mit
datasets:
- samzirbo/europarl.en-es.gendered
- czyzi0/luna-speech-dataset
- czyzi0/pwr-azon-speech-dataset
- sagteam/author_profiling
- kaushalgawri/nptel-en-tags-and-gender-v0
metrics:
- accuracy
- f1
- precision
- recall
base_model: microsoft/deberta-v3-large
pipeline_tag: text-classification
model-index:
- name: gender_prediction_model_from_text
results:
- task:
type: text-classification
name: Text Classification
metrics:
- type: f1
value: 0.69
- type: accuracy
value: 0.69
citations:
- |-
@misc{fc63_gender1_2025,
title = {Gender Prediction from Text},
author = {Γoban, Furkan},
year = {2025},
howpublished = {\url{https://doi.org/10.5281/zenodo.15619489}},
note = {DeBERTa-v3-large model fine-tuned on multi-domain gender-labeled texts}
}
Gender Prediction from Text βοΈ β π©βπ¦°π¨
This model predicts the likely gender of an anonymous speaker or writer based solely on the content of an English text. It is built upon DeBERTa-v3-large and fine-tuned on a diverse, multilingual, and multi-domain dataset with both formal and informal texts.
π Space link: π Try it out on Hugging Face Spaces
π Model repo: π View on Hugging Face Hub
π§ Source code: GitHub
π Model Summary
- Base model:
microsoft/deberta-v3-large
- Fine-tuned on: binary gender classification task (
female
vsmale
) - Best F1 Score:
0.69
on a balanced multi-domain test set - Max token length: 128
- Evaluation Metrics:
- F1: 0.69
- Accuracy: 0.69
- Precision: 0.69
- Recall: 0.69
π Evaluation: View on Notebook
π§Ύ Datasets Used
Dataset | Domain | Type |
---|---|---|
samzirbo/europarl.en-es.gendered | Formal speech (Parliament) | English |
czyzi0/luna-speech-dataset | Phone conversations | Polish β Translated |
czyzi0/pwr-azon-speech-dataset | Phone conversations | Polish β Translated |
sagteam/author_profiling | Social posts | Russian β Translated |
kaushalgawri/nptel-en-tags-and-gender-v0 | Spoken transcripts | English |
Blog Authorship Corpus | Blog posts | English |
All datasets were normalized, translated if necessary, deduplicated, and balanced via random undersampling to ensure equal representation of both genders.
π οΈ Preprocessing & Training
- Normalization: Cleaned quotes, dashes, placeholders, noise, and HTML/code from all datasets.
- Translation: Used
Helsinki-NLP/opus-mt-*
models for Polish and Russian data. - Undersampling: Random undersampling to balance male and female samples.
- Training Strategy:
- LR Finder used to optimize learning rate (
2.66e-6
) - Fine-tuned using early stopping on both F1 and loss
- Step-based evaluation every 250 steps
- Best checkpoint at step 24,750 saved and evaluated
- LR Finder used to optimize learning rate (
- Second Phase Fine-tuning:
- Performed on full merged dataset for 2 epochs
- Used cosine learning rate scheduler and warm-up steps
π Performance (on full merged test set)
Class | Precision | Recall | F1-Score | Accuracy | Support |
---|---|---|---|---|---|
Female | 0.70 | 0.65 | 0.68 | 591,027 | |
Male | 0.68 | 0.72 | 0.70 | 591,027 | |
Macro Avg | 0.69 | 0.69 | 0.69 | 1,182,054 | |
Accuracy | 0.69 | 1,182,054 |
π¦ Usage Example
from transformers import AutoTokenizer, AutoModelForSequenceClassification
import torch
import torch.nn.functional as F
device = torch.device("cuda" if torch.cuda.is_available() else "cpu")
model_name = "fc63/gender_prediction_model_from_text"
tokenizer = AutoTokenizer.from_pretrained(model_name, use_fast=False)
model = AutoModelForSequenceClassification.from_pretrained(model_name).eval().to(device)
def predict(text):
inputs = tokenizer(text, return_tensors="pt", truncation=True, padding=True, max_length=128).to(device)
with torch.no_grad():
outputs = model(**inputs)
probs = F.softmax(outputs.logits, dim=1)
pred = torch.argmax(probs, dim=1).item()
confidence = round(probs[0][pred].item() * 100, 1)
gender = "Female" if pred == 0 else "Male"
return f"{gender} (Confidence: {confidence}%)"
sample_text = "I love writing in my journal every night. It helps me reflect on the day and plan for tomorrow."
print(predict(sample_text))
The Output Of This Sample:
Female (Confidence: 84.1%)
π Future Work & Limitations
I do not want to leave this model at the level of 0.69 accuracy and F1 score.
As far as I can detect at this point, there is a bias towards predicting emotional, psychological, and introspective texts as female. Similarly, more direct and result-oriented writings are also often predicted as male. Therefore, a large, carefully labeled dataset that reflects the opposite of this pattern is needed.
The datasets used to train this model had to be obtained from open-source platforms, which limited the range of accessible data.
To make further progress, I need to create and label a larger dataset myself β which requires a significant amount of time, effort, and cost.
Before moving to dataset creation, I plan to try a few more approaches using the current dataset. So far, alternative techniques have not helped improve the scores without causing overfitting. After testing a few more methods, if none work, the only step left will be building a new dataset β and that will likely be the point where I stop development, as it will be both labor-intensive and costly for me.
π¨βπ¬ Author & License
Author: Furkan Γoban
Project: CENG-481 Gender Prediction Model
License: MIT