prithivMLmods commited on
Commit
bf8d2df
·
verified ·
1 Parent(s): cd31b9a

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +95 -0
README.md CHANGED
@@ -11,8 +11,16 @@ tags:
11
  - male
12
  - female
13
  - biology
 
14
  ---
15
 
 
 
 
 
 
 
 
16
  ```py
17
  Classification Report:
18
 
@@ -29,3 +37,90 @@ weighted avg 0.9848 0.9846 0.9846 6545
29
  ![download.png](https://cdn-uploads.huggingface.co/production/uploads/65bb837dbfb878f46c77de4c/pBnrVbG8uyZYq6Nb4GOuG.png)
30
 
31
  ![download (1).png](https://cdn-uploads.huggingface.co/production/uploads/65bb837dbfb878f46c77de4c/QtZdfaXE-W-C4QDZUWuVC.png)
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
11
  - male
12
  - female
13
  - biology
14
+ - SFT
15
  ---
16
 
17
+ # Common-Voice-Gender-Detection
18
+
19
+ > **Common-Voice-Gender-Detection** is a fine-tuned version of `facebook/wav2vec2-base-960h` for **binary audio classification**, specifically trained to detect speaker gender as **female** or **male**. This model leverages the `Wav2Vec2ForSequenceClassification` architecture for efficient and accurate voice-based gender classification.
20
+
21
+ > **Wav2Vec2**: Self-Supervised Learning for Speech Recognition
22
+ > [https://arxiv.org/pdf/2006.11477](https://arxiv.org/pdf/2006.11477)
23
+
24
  ```py
25
  Classification Report:
26
 
 
37
  ![download.png](https://cdn-uploads.huggingface.co/production/uploads/65bb837dbfb878f46c77de4c/pBnrVbG8uyZYq6Nb4GOuG.png)
38
 
39
  ![download (1).png](https://cdn-uploads.huggingface.co/production/uploads/65bb837dbfb878f46c77de4c/QtZdfaXE-W-C4QDZUWuVC.png)
40
+
41
+ ---
42
+
43
+ ## Label Space: 2 Classes
44
+
45
+ ```
46
+ Class 0: female
47
+ Class 1: male
48
+ ```
49
+
50
+ ---
51
+
52
+ ## Install Dependencies
53
+
54
+ ```py
55
+ %%capture
56
+ !pip install -q gradio transformers torch librosa hf_xet
57
+ ```
58
+
59
+ ---
60
+
61
+ ## Inference Code
62
+
63
+ ```python
64
+ import gradio as gr
65
+ from transformers import Wav2Vec2ForSequenceClassification, Wav2Vec2FeatureExtractor
66
+ import torch
67
+ import librosa
68
+
69
+ # Load model and processor
70
+ model_name = "prithivMLmods/Common-Voice-Geneder-Detection"
71
+ model = Wav2Vec2ForSequenceClassification.from_pretrained(model_name)
72
+ processor = Wav2Vec2FeatureExtractor.from_pretrained(model_name)
73
+
74
+ # Label mapping
75
+ id2label = {
76
+ "0": "female",
77
+ "1": "male"
78
+ }
79
+
80
+ def classify_audio(audio_path):
81
+ # Load and resample audio to 16kHz
82
+ speech, sample_rate = librosa.load(audio_path, sr=16000)
83
+
84
+ # Process audio
85
+ inputs = processor(
86
+ speech,
87
+ sampling_rate=sample_rate,
88
+ return_tensors="pt",
89
+ padding=True
90
+ )
91
+
92
+ with torch.no_grad():
93
+ outputs = model(**inputs)
94
+ logits = outputs.logits
95
+ probs = torch.nn.functional.softmax(logits, dim=1).squeeze().tolist()
96
+
97
+ prediction = {
98
+ id2label[str(i)]: round(probs[i], 3) for i in range(len(probs))
99
+ }
100
+
101
+ return prediction
102
+
103
+ # Gradio Interface
104
+ iface = gr.Interface(
105
+ fn=classify_audio,
106
+ inputs=gr.Audio(type="filepath", label="Upload Audio (WAV, MP3, etc.)"),
107
+ outputs=gr.Label(num_top_classes=2, label="Gender Classification"),
108
+ title="Common Voice Gender Detection",
109
+ description="Upload an audio clip to classify the speaker's gender as female or male."
110
+ )
111
+
112
+ if __name__ == "__main__":
113
+ iface.launch()
114
+ ```
115
+
116
+ ---
117
+
118
+ ## Intended Use
119
+
120
+ `Common-Voice-Gender-Detection` is designed for:
121
+
122
+ * **Speech Analytics** – Assist in analyzing speaker demographics in call centers or customer service recordings.
123
+ * **Conversational AI Personalization** – Adjust tone or dialogue based on gender detection for more personalized voice assistants.
124
+ * **Voice Dataset Curation** – Automatically tag or filter voice datasets by speaker gender for better dataset management.
125
+ * **Research Applications** – Enable linguistic and acoustic research involving gender-specific speech patterns.
126
+ * **Multimedia Content Tagging** – Automate metadata generation for gender identification in podcasts, interviews, or video content.