File size: 4,445 Bytes
699d9c7 bf8d2df cd31b9a 0bca216 bf8d2df e4cf0e3 bf8d2df cd31b9a bf8d2df 3f35a8a 7aba8fd bf8d2df 09d3bab bf8d2df 0bca216 |
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 |
---
license: apache-2.0
language:
- en
base_model:
- facebook/wav2vec2-base-960h
pipeline_tag: audio-classification
library_name: transformers
tags:
- voice-gender-detection
- male
- female
- biology
- SFT
---

# Common-Voice-Gender-Detection
> **Common-Voice-Gender-Detection** is a fine-tuned version of `facebook/wav2vec2-base-960h` for **binary audio classification**, specifically trained to detect speaker gender as **female** or **male**. This model leverages the `Wav2Vec2ForSequenceClassification` architecture for efficient and accurate voice-based gender classification.
> [!note]
Wav2Vec2: Self-Supervised Learning for Speech Recognition : [https://arxiv.org/pdf/2006.11477](https://arxiv.org/pdf/2006.11477)
```py
Classification Report:
precision recall f1-score support
female 0.9705 0.9916 0.9809 2622
male 0.9943 0.9799 0.9870 3923
accuracy 0.9846 6545
macro avg 0.9824 0.9857 0.9840 6545
weighted avg 0.9848 0.9846 0.9846 6545
```


---
## Label Space: 2 Classes
```
Class 0: female
Class 1: male
```
---
## Install Dependencies
```bash
pip install gradio transformers torch librosa hf_xet
```
---
## Inference Code
```python
import gradio as gr
from transformers import Wav2Vec2ForSequenceClassification, Wav2Vec2FeatureExtractor
import torch
import librosa
# Load model and processor
model_name = "prithivMLmods/Common-Voice-Geneder-Detection"
model = Wav2Vec2ForSequenceClassification.from_pretrained(model_name)
processor = Wav2Vec2FeatureExtractor.from_pretrained(model_name)
# Label mapping
id2label = {
"0": "female",
"1": "male"
}
def classify_audio(audio_path):
# Load and resample audio to 16kHz
speech, sample_rate = librosa.load(audio_path, sr=16000)
# Process audio
inputs = processor(
speech,
sampling_rate=sample_rate,
return_tensors="pt",
padding=True
)
with torch.no_grad():
outputs = model(**inputs)
logits = outputs.logits
probs = torch.nn.functional.softmax(logits, dim=1).squeeze().tolist()
prediction = {
id2label[str(i)]: round(probs[i], 3) for i in range(len(probs))
}
return prediction
# Gradio Interface
iface = gr.Interface(
fn=classify_audio,
inputs=gr.Audio(type="filepath", label="Upload Audio (WAV, MP3, etc.)"),
outputs=gr.Label(num_top_classes=2, label="Gender Classification"),
title="Common Voice Gender Detection",
description="Upload an audio clip to classify the speaker's gender as female or male."
)
if __name__ == "__main__":
iface.launch()
```
---
## Demo Inference
> [!note]
male
<audio controls src="https://cdn-uploads.huggingface.co/production/uploads/65bb837dbfb878f46c77de4c/7woMf3_bgX_D99-1Uy3jH.mpga"></audio>

> [!note]
female
<audio controls src="https://cdn-uploads.huggingface.co/production/uploads/65bb837dbfb878f46c77de4c/0d2rDf_DT-gjRWBwiPbm_.mpga"></audio>

---
## Intended Use
`Common-Voice-Gender-Detection` is designed for:
* **Speech Analytics** – Assist in analyzing speaker demographics in call centers or customer service recordings.
* **Conversational AI Personalization** – Adjust tone or dialogue based on gender detection for more personalized voice assistants.
* **Voice Dataset Curation** – Automatically tag or filter voice datasets by speaker gender for better dataset management.
* **Research Applications** – Enable linguistic and acoustic research involving gender-specific speech patterns.
* **Multimedia Content Tagging** – Automate metadata generation for gender identification in podcasts, interviews, or video content. |