RwandaNameGenderModel
RwandaNameGenderModel is a machine learning model that predicts gender based on Rwandan names β whether a first name, surname, or both in any order. It uses a character-level n-gram approach with a logistic regression classifier to provide fast, interpretable, and highly accurate predictions β achieving 96%+ accuracy on both validation and test sets.
π§ Model Overview
- Type: Classic ML (Logistic Regression)
- Input: Rwandan name (flexible: single or full name)
- Vectorization: Character-level n-grams (2β3 chars)
- Framework: scikit-learn
- Training Set: 66,735 names (out of 83,419)
- Validation/Test Accuracy: ~96.6%
π Project Structure
RwandaNameGenderModel/
βββ dataset/
β βββ rwandan_names.csv
βββ model/
β βββ logistic_model.joblib
β βββ vectorizer.joblib
βββ logs/
β βββ metrics_log.txt
βββ train.py
βββ inference.py
βββ README.md
βββ requirements.txt
π Quickstart
1. Install requirements
pip install -r requirements.txt
2. Train the model
python train.py
3. Predict gender from a name using script
Run interactive inference with:
python inference.py
4. Predict gender from a name using Python code
from joblib import load
model = load("model/logistic_model.joblib")
vectorizer = load("model/vectorizer.joblib")
def predict_gender(name):
X = vectorizer.transform([name])
return model.predict(X)[0]
# Flexible input: first name, surname, or both (any order)
predict_gender("Gabriel") # Output: "male"
predict_gender("Baziramwabo") # Output: "male"
predict_gender("Baziramwabo Gabriel") # Output: "male"
predict_gender("Gabriel Baziramwabo") # Output: "male"
π Performance
Dataset | Accuracy | Precision | Recall | F1-Score |
---|---|---|---|---|
Validation | 96.72% | 96.90% | 96.53% | 96.72% |
Test | 96.64% | 96.94% | 96.34% | 96.64% |
Metrics are logged in both logs/metrics_log.txt
and TensorBoard format.
π Use Cases
- Demographic analysis
- Smart form processing
- Voice assistant personalization
- NLP preprocessing for Rwandan corpora
π‘οΈ Ethical Note
This model predicts binary gender based on patterns in names and may not reflect self-identified gender. It should not be used in sensitive contexts without consent.
π License
This project is maintained by Gabriel Baziramwabo and is open for research and educational use. For commercial use, please contact the author.
π€ Contributing
We welcome improvements and multilingual extensions. Fork this repo, improve, and submit a PR!