mms-1b-allFT-mixat-tri-ara

This model is a fine-tuned version of facebook/mms-1b-all on an unknown dataset. It achieves the following results on the evaluation set:

Model description

More information needed

More information needed

More information needed

The following hyperparameters were used during training:

learning_rate: 1e-05
train_batch_size: 2
eval_batch_size: 8
seed: 42
distributed_type: multi-GPU
gradient_accumulation_steps: 8
total_train_batch_size: 16
optimizer: Use OptimizerNames.ADAMW_TORCH with betas=(0.9,0.999) and epsilon=1e-08 and optimizer_args=No additional optimizer arguments
lr_scheduler_type: linear
lr_scheduler_warmup_steps: 500
num_epochs: 30
mixed_precision_training: Native AMP

Training Loss	Epoch	Step	Validation Loss	Wer
31.0714	1.0	210	24.4350	1.0003
26.8056	2.0	420	18.3719	1.0005
14.3797	3.0	630	5.9675	1.0
4.878	4.0	840	4.0668	1.0
4.0485	5.0	1050	3.8871	1.0
3.9462	6.0	1260	3.8183	1.0
3.8848	7.0	1470	3.7649	1.0
3.8365	8.0	1680	3.7236	1.0
3.7944	9.0	1890	3.6770	0.9998
3.7476	10.0	2100	3.6489	0.9995
3.7137	11.0	2310	3.6138	0.9995
3.6835	12.0	2520	3.5817	0.9995
3.6532	13.0	2730	3.5456	0.9994
3.6131	14.0	2940	3.4966	0.9995
3.5702	15.0	3150	3.4312	0.9994
3.5056	16.0	3360	3.3456	0.9994
3.4263	17.0	3570	3.2426	0.9994
3.3568	18.0	3780	3.1248	0.9997
3.2472	19.0	3990	3.0322	1.0011
3.1795	20.0	4200	2.9561	1.0105
3.1379	21.0	4410	2.8892	1.0114
3.0804	22.0	4620	2.8381	1.0106
3.0336	23.0	4830	2.8054	1.0148
3.0099	24.0	5040	2.7714	1.0117
2.9848	25.0	5250	2.7454	1.0114
2.9579	26.0	5460	2.7276	1.0111
2.9503	27.0	5670	2.7132	1.0112
2.934	28.0	5880	2.7030	1.0103
2.9293	29.0	6090	2.6973	1.0089
2.9726	29.8597	6270	2.6956	1.0091