wav2vec2-large-xlsr-300m-nepali

This model is a fine-tuned version of facebook/wav2vec2-xls-r-300m

The dataset used to train are :

For evaluation on publicly available datasets, can use OpenSLR-43 corpus - Model is not trained on this data which is also available on HuggingFace as train-set where it achieves 27% WER and 8.3% CER with 5 gram language model.

Script to Evaluate the Model on OpenSLR-43 train set :

python3 eval.py --model_id prajin/wav2vec2-large-xlsr-300m-nepali --dataset openslr --config SLR43  --split train --log_outputs

Below evaluation result is the evaluation on separated 10000 samples from total training dataset.

It achieves the following results on the evaluation set Without using Language Model :

With Language model ( 5 gram )

Model description

More information needed

More information needed

More information needed

The following hyperparameters were used during training:

Training Loss	Epoch	Step	Validation Loss	Wer
0.3468	0.04	400	0.2624	0.3479
0.2792	0.08	800	0.2696	0.3490
0.245	0.12	1200	0.2751	0.3502
0.2453	0.16	1600	0.2754	0.3523
0.2332	0.2	2000	0.2779	0.3517
0.2321	0.24	2400	0.2775	0.3528
0.2708	0.28	2800	0.2764	0.3533
0.2709	0.32	3200	0.2723	0.3544
0.2715	0.36	3600	0.2739	0.3545
0.2732	0.4	4000	0.2707	0.3498
0.2643	0.44	4400	0.2696	0.3499
0.2682	0.47	4800	0.2672	0.3492
0.2687	0.51	5200	0.2644	0.3474
0.269	0.55	5600	0.2619	0.3502
0.2675	0.59	6000	0.2606	0.3477
0.2656	0.63	6400	0.2597	0.3463
0.2667	0.67	6800	0.2607	0.3458
0.2639	0.71	7200	0.2601	0.3480
0.2631	0.75	7600	0.2582	0.3447
0.2589	0.79	8000	0.2577	0.3438
0.2554	0.83	8400	0.2557	0.3439
0.2687	0.87	8800	0.2546	0.3438
0.2574	0.91	9200	0.2537	0.3434
0.2623	0.95	9600	0.2530	0.3433
0.2675	0.99	10000	0.2530	0.3426