mms-1b-allFTwPTtok-mixat-tri-ara

This model is a fine-tuned version of facebook/mms-1b-all on an unknown dataset. It achieves the following results on the evaluation set:

Model description

More information needed

More information needed

More information needed

The following hyperparameters were used during training:

learning_rate: 1e-05
train_batch_size: 2
eval_batch_size: 8
seed: 42
distributed_type: multi-GPU
gradient_accumulation_steps: 8
total_train_batch_size: 16
optimizer: Use OptimizerNames.ADAMW_TORCH with betas=(0.9,0.999) and epsilon=1e-08 and optimizer_args=No additional optimizer arguments
lr_scheduler_type: linear
lr_scheduler_warmup_steps: 500
num_epochs: 30
mixed_precision_training: Native AMP

Training Loss	Epoch	Step	Validation Loss	Wer
15.1581	1.0	210	9.5796	0.8749
10.6357	2.0	420	5.9444	0.9257
5.1921	3.0	630	3.8132	0.9772
3.7964	4.0	840	3.2935	0.9758
3.4423	5.0	1050	3.0666	0.9669
3.2654	6.0	1260	2.9011	0.9689
3.1153	7.0	1470	2.7733	0.9727
3.0151	7.9982	1672	2.6665	0.9703
2.9164	9.0	1882	2.5764	0.9704
2.8297	10.0	2092	2.5101	0.9649
2.7763	11.0	2302	2.4446	0.9738
2.7362	12.0	2512	2.4067	0.9723
2.6919	13.0	2722	2.3765	0.9775
2.6551	14.0	2932	2.3572	0.9721
2.6544	15.0	3142	2.3300	0.9792
2.6227	16.0	3352	2.3215	0.9740
2.6194	17.0	3562	2.3054	0.9754
2.6107	18.0	3772	2.2895	0.9774
2.5884	19.0	3982	2.2749	0.9807
2.5803	20.0	4192	2.2710	0.9743
2.5688	21.0	4402	2.2606	0.9763
2.5607	22.0	4612	2.2557	0.9741
2.5561	23.0	4822	2.2469	0.9767
2.5393	24.0	5032	2.2413	0.9778
2.5376	25.0	5242	2.2394	0.9746
2.53	26.0	5452	2.2369	0.9735
2.5457	27.0	5662	2.2334	0.9750
2.5361	28.0	5872	2.2313	0.9757
2.5405	29.0	6082	2.2308	0.9754
2.5384	29.8979	6270	2.2299	0.9754