File size: 4,445 Bytes
699d9c7
 
 
 
 
 
 
 
 
 
 
 
 
bf8d2df
cd31b9a
 
0bca216
 
bf8d2df
 
 
 
e4cf0e3
 
bf8d2df
cd31b9a
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
bf8d2df
 
 
 
 
 
 
 
 
 
 
 
 
 
3f35a8a
7aba8fd
bf8d2df
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
09d3bab
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
bf8d2df
 
 
 
 
 
 
 
0bca216
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
---
license: apache-2.0
language:
- en
base_model:
- facebook/wav2vec2-base-960h
pipeline_tag: audio-classification
library_name: transformers
tags:
- voice-gender-detection
- male
- female
- biology
- SFT
---

![1.png](https://cdn-uploads.huggingface.co/production/uploads/65bb837dbfb878f46c77de4c/C-lq4SZvqsDgoppDfY4oh.png)

# Common-Voice-Gender-Detection

> **Common-Voice-Gender-Detection** is a fine-tuned version of `facebook/wav2vec2-base-960h` for **binary audio classification**, specifically trained to detect speaker gender as **female** or **male**. This model leverages the `Wav2Vec2ForSequenceClassification` architecture for efficient and accurate voice-based gender classification.

> [!note]
Wav2Vec2: Self-Supervised Learning for Speech Recognition : [https://arxiv.org/pdf/2006.11477](https://arxiv.org/pdf/2006.11477)

```py
Classification Report:

              precision    recall  f1-score   support

      female     0.9705    0.9916    0.9809      2622
        male     0.9943    0.9799    0.9870      3923

    accuracy                         0.9846      6545
   macro avg     0.9824    0.9857    0.9840      6545
weighted avg     0.9848    0.9846    0.9846      6545
```

![download.png](https://cdn-uploads.huggingface.co/production/uploads/65bb837dbfb878f46c77de4c/pBnrVbG8uyZYq6Nb4GOuG.png)

![download (1).png](https://cdn-uploads.huggingface.co/production/uploads/65bb837dbfb878f46c77de4c/QtZdfaXE-W-C4QDZUWuVC.png)

---

## Label Space: 2 Classes

```
Class 0: female  
Class 1: male
```

---

## Install Dependencies

```bash
pip install gradio transformers torch librosa hf_xet
```

---

## Inference Code

```python
import gradio as gr
from transformers import Wav2Vec2ForSequenceClassification, Wav2Vec2FeatureExtractor
import torch
import librosa

# Load model and processor
model_name = "prithivMLmods/Common-Voice-Geneder-Detection"
model = Wav2Vec2ForSequenceClassification.from_pretrained(model_name)
processor = Wav2Vec2FeatureExtractor.from_pretrained(model_name)

# Label mapping
id2label = {
    "0": "female",
    "1": "male"
}

def classify_audio(audio_path):
    # Load and resample audio to 16kHz
    speech, sample_rate = librosa.load(audio_path, sr=16000)

    # Process audio
    inputs = processor(
        speech,
        sampling_rate=sample_rate,
        return_tensors="pt",
        padding=True
    )

    with torch.no_grad():
        outputs = model(**inputs)
        logits = outputs.logits
        probs = torch.nn.functional.softmax(logits, dim=1).squeeze().tolist()

    prediction = {
        id2label[str(i)]: round(probs[i], 3) for i in range(len(probs))
    }

    return prediction

# Gradio Interface
iface = gr.Interface(
    fn=classify_audio,
    inputs=gr.Audio(type="filepath", label="Upload Audio (WAV, MP3, etc.)"),
    outputs=gr.Label(num_top_classes=2, label="Gender Classification"),
    title="Common Voice Gender Detection",
    description="Upload an audio clip to classify the speaker's gender as female or male."
)

if __name__ == "__main__":
    iface.launch()
```

---

## Demo Inference

> [!note]
male

<audio controls src="https://cdn-uploads.huggingface.co/production/uploads/65bb837dbfb878f46c77de4c/7woMf3_bgX_D99-1Uy3jH.mpga"></audio>

![Screenshot 2025-05-31 at 20-19-39 Common Voice Gender Detection.png](https://cdn-uploads.huggingface.co/production/uploads/65bb837dbfb878f46c77de4c/h1LqWmWbyi3ao2yvSWQSI.png)


> [!note]
female

<audio controls src="https://cdn-uploads.huggingface.co/production/uploads/65bb837dbfb878f46c77de4c/0d2rDf_DT-gjRWBwiPbm_.mpga"></audio>

![Screenshot 2025-05-31 at 20-21-57 Common Voice Gender Detection.png](https://cdn-uploads.huggingface.co/production/uploads/65bb837dbfb878f46c77de4c/TTAKrOOZ2sCS846wZWani.png)

--- 

## Intended Use

`Common-Voice-Gender-Detection` is designed for:

* **Speech Analytics** – Assist in analyzing speaker demographics in call centers or customer service recordings.
* **Conversational AI Personalization** – Adjust tone or dialogue based on gender detection for more personalized voice assistants.
* **Voice Dataset Curation** – Automatically tag or filter voice datasets by speaker gender for better dataset management.
* **Research Applications** – Enable linguistic and acoustic research involving gender-specific speech patterns.
* **Multimedia Content Tagging** – Automate metadata generation for gender identification in podcasts, interviews, or video content.