RwandaNameGenderModel

RwandaNameGenderModel is a machine learning model that predicts gender based on Rwandan names β€” whether a first name, surname, or both in any order. It uses a character-level n-gram approach with a logistic regression classifier to provide fast, interpretable, and highly accurate predictions β€” achieving 96%+ accuracy on both validation and test sets.


🧠 Model Overview

  • Type: Classic ML (Logistic Regression)
  • Input: Rwandan name (flexible: single or full name)
  • Vectorization: Character-level n-grams (2–3 chars)
  • Framework: scikit-learn
  • Training Set: 66,735 names (out of 83,419)
  • Validation/Test Accuracy: ~96.6%

πŸ“ Project Structure

RwandaNameGenderModel/
β”œβ”€β”€ dataset/
β”‚   └── rwandan_names.csv
β”œβ”€β”€ model/
β”‚   β”œβ”€β”€ logistic_model.joblib
β”‚   └── vectorizer.joblib
β”œβ”€β”€ logs/
β”‚   └── metrics_log.txt
β”œβ”€β”€ train.py
β”œβ”€β”€ inference.py
β”œβ”€β”€ README.md
└── requirements.txt

πŸš€ Quickstart

1. Install requirements

pip install -r requirements.txt

2. Train the model

python train.py

3. Predict gender from a name using script

Run interactive inference with:

python inference.py

4. Predict gender from a name using Python code

from joblib import load

model = load("model/logistic_model.joblib")
vectorizer = load("model/vectorizer.joblib")

def predict_gender(name):
    X = vectorizer.transform([name])
    return model.predict(X)[0]

# Flexible input: first name, surname, or both (any order)
predict_gender("Gabriel")                 # Output: "male"
predict_gender("Baziramwabo")             # Output: "male"
predict_gender("Baziramwabo Gabriel")     # Output: "male"
predict_gender("Gabriel Baziramwabo")     # Output: "male"

πŸ“ˆ Performance

Dataset Accuracy Precision Recall F1-Score
Validation 96.72% 96.90% 96.53% 96.72%
Test 96.64% 96.94% 96.34% 96.64%

Metrics are logged in both logs/metrics_log.txt and TensorBoard format.


🌍 Use Cases

  • Demographic analysis
  • Smart form processing
  • Voice assistant personalization
  • NLP preprocessing for Rwandan corpora

πŸ›‘οΈ Ethical Note

This model predicts binary gender based on patterns in names and may not reflect self-identified gender. It should not be used in sensitive contexts without consent.


πŸ“„ License

This project is maintained by Gabriel Baziramwabo and is open for research and educational use. For commercial use, please contact the author.


🀝 Contributing

We welcome improvements and multilingual extensions. Fork this repo, improve, and submit a PR!


πŸ”— Links

Downloads last month

-

Downloads are not tracked for this model. How to track
Inference Providers NEW
This model isn't deployed by any Inference Provider. πŸ™‹ Ask for provider support