asadullah797
/

ssl-semi-multitask

Audio Classification

automatic-speech-recognition

emotion-recognition

speaker-identification

Model card Files Files and versions

asadullah797 commited on 9 days ago

Commit

7e41af2

·

verified ·

1 Parent(s): 1239ebe

Update README.md

Files changed (1) hide show

README.md +40 -4

README.md CHANGED Viewed

@@ -9,7 +9,43 @@ tags:
 - speaker-identification
 ---
-This model has been pushed to the Hub using the [PytorchModelHubMixin](https://huggingface.co/docs/huggingface_hub/package_reference/mixins#huggingface_hub.PyTorchModelHubMixin) integration:
-- Code: https://huggingface.co/asadullah797/ssl-semi-multitask
-- Paper: [More Information Needed]
-- Docs: https://github.com/asadullah797/ssl_semi-multitask/blob/main/README.md

 - speaker-identification
 ---
+Multitask Speech Model with Wav2Vec2
+This repository contains a multitask learning pipeline built on top of Wav2Vec2
+, designed to jointly perform:
+Automatic Speech Recognition (ASR) (character-level CTC loss)
+Speaker Identification
+Emotion Recognition
+The system is trained on a combination of training dataset with parallel data from speech transcriptions, speaker identification and emotion recognition labels.
+📌 Features
+Multitask model (Wav2Vec2MultiTasks) with shared Wav2Vec2 encoder and separate heads for:
+Speech Recognition (CTC)
+Speaker classification
+Emotion classification
+Custom data preprocessing:
+Cleans transcripts (removes punctuation & special characters)
+Converts numbers into words
+Builds a vocabulary and tokenizer
+Filters short/invalid audio
+Training, validation, and test splits with collators for CTC.
+Evaluation metrics:
+Character Error Rate (CER) for character recognition
+Accuracy for speaker and emotion classification