wavlm-large_finetuned_RAVDESS

This model is a fine-tuned version of microsoft/wavlm-large on the Ryerson Audio-Visual Database of Emotional Speech and Song (RAVDESS). It achieves the following results on the evaluation set:

Loss: 0.3534
Accuracy: 0.9028

Model description

More information needed

Intended uses & limitations

More information needed

Training and evaluation data

More information needed

Training procedure

Training hyperparameters

The following hyperparameters were used during training:

learning_rate: 0.0003
train_batch_size: 32
eval_batch_size: 32
seed: 42
gradient_accumulation_steps: 4
total_train_batch_size: 128
optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
lr_scheduler_type: linear
lr_scheduler_warmup_ratio: 0.1
num_epochs: 20

Training results

Training Loss	Epoch	Step	Validation Loss	Accuracy
No log	1.0	9	2.0485	0.2326
2.0712	2.0	18	1.8028	0.2917
1.9355	3.0	27	1.7300	0.3229
1.7116	4.0	36	1.3749	0.4722
1.4907	5.0	45	1.0586	0.6493
1.1558	6.0	54	0.8834	0.6771
0.8621	7.0	63	0.9206	0.6944
0.6437	8.0	72	0.5895	0.8194
0.4634	9.0	81	0.7389	0.7743
0.3974	10.0	90	0.4569	0.8542
0.3974	11.0	99	0.5140	0.8438
0.3105	12.0	108	0.4273	0.8611
0.2094	13.0	117	0.3608	0.8993
0.1401	14.0	126	0.5715	0.8194
0.1249	15.0	135	0.3715	0.8854
0.0953	16.0	144	0.4112	0.8785
0.0955	17.0	153	0.3692	0.9062
0.0807	18.0	162	0.4395	0.8646
0.1077	19.0	171	0.3413	0.9201
0.0578	20.0	180	0.3534	0.9028

Framework versions

Transformers 4.45.2
Pytorch 2.5.0+cu118
Datasets 3.0.1
Tokenizers 0.20.1