update description and move files
Browse files- README.md +51 -0
- test_results.csv → _test/test_results.csv +0 -0
- config.yaml → conf/config.yaml +0 -0
README.md
CHANGED
@@ -28,3 +28,54 @@ model-index:
|
|
28 |
name: Unweighted Average Recall
|
29 |
value: 0.6499883154795764
|
30 |
---
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
28 |
name: Unweighted Average Recall
|
29 |
value: 0.6499883154795764
|
30 |
---
|
31 |
+
|
32 |
+
# Speech Emotion Recognition Model
|
33 |
+
|
34 |
+
`Wav2Vec2-Large-Robust` model fine-tuned on the [MSP-Podcast](https://ecs.utdallas.edu/research/researchlabs/msp-lab/MSP-Podcast.html)
|
35 |
+
(v1.11) dataset for classifying emotions into four categories: _Anger (A)_, _Happiness (H)_, _Neutral (N)_, and _Sadness (S)_.
|
36 |
+
|
37 |
+
## Installation
|
38 |
+
|
39 |
+
To use the model, install autrainer, e.g., via pip:
|
40 |
+
|
41 |
+
```bash
|
42 |
+
pip install autrainer
|
43 |
+
```
|
44 |
+
|
45 |
+
## Usage
|
46 |
+
|
47 |
+
The model can be applied to all audio files in a folder (`<data-root>`) and stores the predictions in another folder (`<output-root>`):
|
48 |
+
|
49 |
+
```bash
|
50 |
+
autrainer inference hf:autrainer/msp-podcast-emo-class-big4-w2v2-l-emo <data-root> <output-root>
|
51 |
+
```
|
52 |
+
|
53 |
+
## Training
|
54 |
+
|
55 |
+
### Pretraining
|
56 |
+
|
57 |
+
The model has been originally trained on the MSP-Podcast (v1.7) dataset by [audEERING](https://huggingface.co/audeering/wav2vec2-large-robust-12-ft-emotion-msp-dim) to predict three emotional dimensions: _arousal_, _dominance_, and _valence_.
|
58 |
+
|
59 |
+
### Dataset
|
60 |
+
|
61 |
+
The model was further fine-tuned on the MSP-Podcast (v1.11) dataset, a large corpus of spontaneous emotional speech collected from various podcast recordings.
|
62 |
+
The dataset includes natural emotional expressions which cover a broad range of speakers, recording conditions, and conversation topics.
|
63 |
+
|
64 |
+
**Note:** The MSP-Podcast dataset is not yet included in the autrainer 0.5.0 release but can be found in [this Pull Request](https://github.com/autrainer/autrainer/pull/46).
|
65 |
+
|
66 |
+
### Training Process
|
67 |
+
|
68 |
+
The model has been fine-tuned for 5 epochs.
|
69 |
+
At the end of each epoch, the model was evaluated on the validation set.
|
70 |
+
We release the state that achieved the best performance on this validation set.
|
71 |
+
All training hyperparameters can be found in the main configuration file (`conf/config.yaml`).
|
72 |
+
|
73 |
+
### Evaluation
|
74 |
+
|
75 |
+
We evaluate the model on the `Test1` split of the MSP-Podcast dataset.
|
76 |
+
The model achieves a classification accuracy of 0.617 on the test set.
|
77 |
+
|
78 |
+
## Acknowledgements
|
79 |
+
|
80 |
+
Please acknowledge the work which produced the original model and the MSP-Podcast dataset.
|
81 |
+
We would also appreciate an acknowledgment to autrainer.
|
test_results.csv → _test/test_results.csv
RENAMED
File without changes
|
config.yaml → conf/config.yaml
RENAMED
File without changes
|