update description and move files

Browse files

Files changed (3) hide show

README.md +51 -0
test_results.csv → _test/test_results.csv +0 -0
config.yaml → conf/config.yaml +0 -0

README.md CHANGED Viewed

@@ -28,3 +28,54 @@ model-index:
             name: Unweighted Average Recall
             value: 0.6499883154795764
 ---

             name: Unweighted Average Recall
             value: 0.6499883154795764
 ---
+# Speech Emotion Recognition Model
+`Wav2Vec2-Large-Robust` model fine-tuned on the [MSP-Podcast](https://ecs.utdallas.edu/research/researchlabs/msp-lab/MSP-Podcast.html)
+(v1.11) dataset for classifying emotions into four categories: _Anger (A)_, _Happiness (H)_, _Neutral (N)_, and _Sadness (S)_.
+## Installation
+To use the model, install autrainer, e.g., via pip:
+```bash
+pip install autrainer
+```
+## Usage
+The model can be applied to all audio files in a folder (`<data-root>`) and stores the predictions in another folder (`<output-root>`):
+```bash
+autrainer inference hf:autrainer/msp-podcast-emo-class-big4-w2v2-l-emo <data-root> <output-root>
+```
+## Training
+### Pretraining
+The model has been originally trained on the MSP-Podcast (v1.7) dataset by [audEERING](https://huggingface.co/audeering/wav2vec2-large-robust-12-ft-emotion-msp-dim) to predict three emotional dimensions: _arousal_, _dominance_, and _valence_.
+### Dataset
+The model was further fine-tuned on the MSP-Podcast (v1.11) dataset, a large corpus of spontaneous emotional speech collected from various podcast recordings.
+The dataset includes natural emotional expressions which cover a broad range of speakers, recording conditions, and conversation topics.
+**Note:** The MSP-Podcast dataset is not yet included in the autrainer 0.5.0 release but can be found in [this Pull Request](https://github.com/autrainer/autrainer/pull/46).
+### Training Process
+The model has been fine-tuned for 5 epochs.
+At the end of each epoch, the model was evaluated on the validation set.
+We release the state that achieved the best performance on this validation set.
+All training hyperparameters can be found in the main configuration file (`conf/config.yaml`).
+### Evaluation
+We evaluate the model on the `Test1` split of the MSP-Podcast dataset.
+The model achieves a classification accuracy of 0.617 on the test set.
+## Acknowledgements
+Please acknowledge the work which produced the original model and the MSP-Podcast dataset.
+We would also appreciate an acknowledgment to autrainer.

test_results.csv → _test/test_results.csv RENAMED Viewed

File without changes

config.yaml → conf/config.yaml RENAMED Viewed

File without changes