File size: 1,162 Bytes
2e9796c
b86298d
1239ebe
2e9796c
14ac1a3
 
 
693c672
 
 
 
8157710
8e0fa9c
10dea91
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
693c672
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
---
license: mit
pipeline_tag: audio-classification
tags:
- automatic-speech-recognition
- emotion-recognition
- speaker-identification
language:
- en
base_model:
- facebook/wav2vec2-base
---

Multitask Speech Model with Wav2Vec2

This repository contains a multitask learning pipeline built on top of Wav2Vec2
, designed to jointly perform:

Automatic Speech Recognition (ASR) (character-level CTC loss)

Speaker Identification

Emotion Recognition

The system is trained on a combination of training dataset with parallel data from speech transcriptions, speaker identification and emotion recognition labels.

📌 Features

Multitask model (Wav2Vec2MultiTasks) with shared Wav2Vec2 encoder and separate heads for:

Speech Recognition (CTC)

Speaker classification

Emotion classification

Custom data preprocessing:

Cleans transcripts (removes punctuation & special characters)

Converts numbers into words

Builds a vocabulary and tokenizer

Filters short/invalid audio

Training, validation, and test splits with collators for CTC.

Evaluation metrics:

Character Error Rate (CER) for character recognition

Accuracy for speaker and emotion classification