File size: 2,965 Bytes
90b3013
 
 
f0e6b31
90b3013
f0e6b31
90b3013
 
f0e6b31
 
 
 
 
 
90b3013
 
 
 
 
 
 
 
 
 
56f55ac
90b3013
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
56f55ac
 
 
90b3013
 
56f55ac
90b3013
 
56f55ac
90b3013
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
56f55ac
90b3013
56f55ac
90b3013
56f55ac
90b3013
 
 
 
 
 
 
 
 
 
 
 
 
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
---
license: apache-2.0
tags:
- fine-tune
metrics:
- accuracy
model_index:
  name: wav2vec2-base-en-speech-emotion-recognition
language:
- id
pipeline_tag: audio-classification
library_name: transformers
base_model:
- facebook/wav2vec2-base
---

# Indonesian Speech Emotion Recognition By Fine-Tuning Wav2Vec 2.0

The model is a fine-tuned version
of [facebook/wav2vec2-base](https://huggingface.co/facebook/wav2vec2-base)
for a Speech Emotion Recognition (SER) task.

The dataset used to fine-tune the original pre-trained model is
the [RAVDESS dataset](https://zenodo.org/record/1188976#.YO6yI-gzaUk). This dataset provides 1440 samples of recordings
from actors performing on 5 different emotions in Bahasa Indonesia, which are:

```python
emotions = ['angry', 'disgust', 'fear', 'happy', 'sad']
```

It achieves the following results on the evaluation set:

- Loss: 0.5023
- Accuracy: 0.8223

## Model description

More information needed

## Intended uses & limitations

More information needed

## Training and evaluation data

More information needed

## Training procedure

### Training hyperparameters

The following hyperparameters were used during training:

- learning_rate: 3e-5
- train_batch_size: 32
- eval_batch_size: 32
- seed: 42
- gradient_accumulation_steps: 2
- total_train_batch_size: 10
- optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
- lr_scheduler_type: linear
- num_epochs: 10
- mixed_precision_training: Native AMP

### Training results

| Training Loss | Epoch | Step | Validation Loss | Accuracy |
|:-------------:|:-----:|:----:|:---------------:|:--------:|
|    2.0752     | 0.21  |  30  |     2.0505      |  0.1359  |
|    2.0119     | 0.42  |  60  |     1.9340      |  0.2474  |
|    1.8073     | 0.63  |  90  |     1.5169      |  0.3902  |
|    1.5418     | 0.84  | 120  |     1.2373      |  0.5610  |
|    1.1432     | 1.05  | 150  |     1.1579      |  0.5610  |
|    0.9645     | 1.26  | 180  |     0.9610      |  0.6167  |
|    0.8811     | 1.47  | 210  |     0.8063      |  0.7178  |
|    0.8756     | 1.68  | 240  |     0.7379      |  0.7352  |
|    0.8208     | 1.89  | 270  |     0.6839      |  0.7596  |
|    0.7118     |  2.1  | 300  |     0.6664      |  0.7735  |
|    0.4261     | 2.31  | 330  |     0.6058      |  0.8014  |
|    0.4394     | 2.52  | 360  |     0.5754      |  0.8223  |
|    0.4581     | 2.72  | 390  |     0.4719      |  0.8467  |
|    0.3967     | 2.93  | 420  |     0.5023      |  0.8223  |

## Citation

```bibtex
@misc {alianur_rahman_2024,
	author       = { {Alianur Rahman} },
	title        = { wav2vec2-base-indonesian-speech-emotion-recognition (Revision 1fcfcf1) },
	year         = 2024,
	url          = { https://huggingface.co/alianurrahman/wav2vec2-base-indonesian-speech-emotion-recognition }
}
```

## Contact

Any doubt, contact me on [Twitter](https://x.com/alianur_rahman).

### Framework versions

- Transformers 4.45.1
- Pytorch 2.2.2
- Datasets 3.0.1
- Tokenizers 0.20.0