ramppdev commited on
Commit
b1ddeac
·
1 Parent(s): 6535f24

update description and move files

Browse files
README.md CHANGED
@@ -28,3 +28,54 @@ model-index:
28
  name: Unweighted Average Recall
29
  value: 0.6499883154795764
30
  ---
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
28
  name: Unweighted Average Recall
29
  value: 0.6499883154795764
30
  ---
31
+
32
+ # Speech Emotion Recognition Model
33
+
34
+ `Wav2Vec2-Large-Robust` model fine-tuned on the [MSP-Podcast](https://ecs.utdallas.edu/research/researchlabs/msp-lab/MSP-Podcast.html)
35
+ (v1.11) dataset for classifying emotions into four categories: _Anger (A)_, _Happiness (H)_, _Neutral (N)_, and _Sadness (S)_.
36
+
37
+ ## Installation
38
+
39
+ To use the model, install autrainer, e.g., via pip:
40
+
41
+ ```bash
42
+ pip install autrainer
43
+ ```
44
+
45
+ ## Usage
46
+
47
+ The model can be applied to all audio files in a folder (`<data-root>`) and stores the predictions in another folder (`<output-root>`):
48
+
49
+ ```bash
50
+ autrainer inference hf:autrainer/msp-podcast-emo-class-big4-w2v2-l-emo <data-root> <output-root>
51
+ ```
52
+
53
+ ## Training
54
+
55
+ ### Pretraining
56
+
57
+ The model has been originally trained on the MSP-Podcast (v1.7) dataset by [audEERING](https://huggingface.co/audeering/wav2vec2-large-robust-12-ft-emotion-msp-dim) to predict three emotional dimensions: _arousal_, _dominance_, and _valence_.
58
+
59
+ ### Dataset
60
+
61
+ The model was further fine-tuned on the MSP-Podcast (v1.11) dataset, a large corpus of spontaneous emotional speech collected from various podcast recordings.
62
+ The dataset includes natural emotional expressions which cover a broad range of speakers, recording conditions, and conversation topics.
63
+
64
+ **Note:** The MSP-Podcast dataset is not yet included in the autrainer 0.5.0 release but can be found in [this Pull Request](https://github.com/autrainer/autrainer/pull/46).
65
+
66
+ ### Training Process
67
+
68
+ The model has been fine-tuned for 5 epochs.
69
+ At the end of each epoch, the model was evaluated on the validation set.
70
+ We release the state that achieved the best performance on this validation set.
71
+ All training hyperparameters can be found in the main configuration file (`conf/config.yaml`).
72
+
73
+ ### Evaluation
74
+
75
+ We evaluate the model on the `Test1` split of the MSP-Podcast dataset.
76
+ The model achieves a classification accuracy of 0.617 on the test set.
77
+
78
+ ## Acknowledgements
79
+
80
+ Please acknowledge the work which produced the original model and the MSP-Podcast dataset.
81
+ We would also appreciate an acknowledgment to autrainer.
test_results.csv → _test/test_results.csv RENAMED
File without changes
config.yaml → conf/config.yaml RENAMED
File without changes