asadullah797
/

ssl-semi-multitask

Audio Classification

automatic-speech-recognition

emotion-recognition

speaker-identification

Model card Files Files and versions

asadullah797 commited on 6 days ago

Commit

972d2ba

·

verified ·

1 Parent(s): 7e41af2

Push model using huggingface_hub.

Files changed (2) hide show

README.md +4 -40
model.safetensors +1 -1

README.md CHANGED Viewed

@@ -9,43 +9,7 @@ tags:
 - speaker-identification
 ---
-Multitask Speech Model with Wav2Vec2
-This repository contains a multitask learning pipeline built on top of Wav2Vec2
-, designed to jointly perform:
-Automatic Speech Recognition (ASR) (character-level CTC loss)
-Speaker Identification
-Emotion Recognition
-The system is trained on a combination of training dataset with parallel data from speech transcriptions, speaker identification and emotion recognition labels.
-📌 Features
-Multitask model (Wav2Vec2MultiTasks) with shared Wav2Vec2 encoder and separate heads for:
-Speech Recognition (CTC)
-Speaker classification
-Emotion classification
-Custom data preprocessing:
-Cleans transcripts (removes punctuation & special characters)
-Converts numbers into words
-Builds a vocabulary and tokenizer
-Filters short/invalid audio
-Training, validation, and test splits with collators for CTC.
-Evaluation metrics:
-Character Error Rate (CER) for character recognition
-Accuracy for speaker and emotion classification

 - speaker-identification
 ---
+This model has been pushed to the Hub using the [PytorchModelHubMixin](https://huggingface.co/docs/huggingface_hub/package_reference/mixins#huggingface_hub.PyTorchModelHubMixin) integration:
+- Code: https://huggingface.co/asadullah797/ssl-semi-multitask
+- Paper: [More Information Needed]
+- Docs: https://github.com/asadullah797/ssl_semi-multitask/blob/main/README.md

model.safetensors CHANGED Viewed

@@ -1,3 +1,3 @@
 version https://git-lfs.github.com/spec/v1
-oid sha256:ae533279a661450c6fb7b60a7e2052b262bd0260bf7ca144e05ed7c7bed109bb
 size 378804760

 version https://git-lfs.github.com/spec/v1
+oid sha256:c56ddf35a0ec7104eed113e1ffd022ba1e38a35e77403b71e06a29be01dd3791
 size 378804760