File size: 1,162 Bytes
2e9796c b86298d 1239ebe 2e9796c 14ac1a3 693c672 8157710 8e0fa9c 10dea91 693c672 |
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 |
---
license: mit
pipeline_tag: audio-classification
tags:
- automatic-speech-recognition
- emotion-recognition
- speaker-identification
language:
- en
base_model:
- facebook/wav2vec2-base
---
Multitask Speech Model with Wav2Vec2
This repository contains a multitask learning pipeline built on top of Wav2Vec2
, designed to jointly perform:
Automatic Speech Recognition (ASR) (character-level CTC loss)
Speaker Identification
Emotion Recognition
The system is trained on a combination of training dataset with parallel data from speech transcriptions, speaker identification and emotion recognition labels.
📌 Features
Multitask model (Wav2Vec2MultiTasks) with shared Wav2Vec2 encoder and separate heads for:
Speech Recognition (CTC)
Speaker classification
Emotion classification
Custom data preprocessing:
Cleans transcripts (removes punctuation & special characters)
Converts numbers into words
Builds a vocabulary and tokenizer
Filters short/invalid audio
Training, validation, and test splits with collators for CTC.
Evaluation metrics:
Character Error Rate (CER) for character recognition
Accuracy for speaker and emotion classification |