asadullah797 commited on
Commit
7e41af2
·
verified ·
1 Parent(s): 1239ebe

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +40 -4
README.md CHANGED
@@ -9,7 +9,43 @@ tags:
9
  - speaker-identification
10
  ---
11
 
12
- This model has been pushed to the Hub using the [PytorchModelHubMixin](https://huggingface.co/docs/huggingface_hub/package_reference/mixins#huggingface_hub.PyTorchModelHubMixin) integration:
13
- - Code: https://huggingface.co/asadullah797/ssl-semi-multitask
14
- - Paper: [More Information Needed]
15
- - Docs: https://github.com/asadullah797/ssl_semi-multitask/blob/main/README.md
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
9
  - speaker-identification
10
  ---
11
 
12
+ Multitask Speech Model with Wav2Vec2
13
+
14
+ This repository contains a multitask learning pipeline built on top of Wav2Vec2
15
+ , designed to jointly perform:
16
+
17
+ Automatic Speech Recognition (ASR) (character-level CTC loss)
18
+
19
+ Speaker Identification
20
+
21
+ Emotion Recognition
22
+
23
+ The system is trained on a combination of training dataset with parallel data from speech transcriptions, speaker identification and emotion recognition labels.
24
+
25
+ 📌 Features
26
+
27
+ Multitask model (Wav2Vec2MultiTasks) with shared Wav2Vec2 encoder and separate heads for:
28
+
29
+ Speech Recognition (CTC)
30
+
31
+ Speaker classification
32
+
33
+ Emotion classification
34
+
35
+ Custom data preprocessing:
36
+
37
+ Cleans transcripts (removes punctuation & special characters)
38
+
39
+ Converts numbers into words
40
+
41
+ Builds a vocabulary and tokenizer
42
+
43
+ Filters short/invalid audio
44
+
45
+ Training, validation, and test splits with collators for CTC.
46
+
47
+ Evaluation metrics:
48
+
49
+ Character Error Rate (CER) for character recognition
50
+
51
+ Accuracy for speaker and emotion classification