Mizo Automatic Speech Recognition

This model is a fine-tuned version of facebook/wav2vec2-base on the generator dataset. It achieves the following results on the evaluation set:

Loss: 0.1614
Wer: 0.1659

Citation

BibTeX entry and citation info:

@article{10.1145/3746063,
author = {Bawitlung, Andrew and Dash, Sandeep Kumar and Pattanayak, Radha Mohan},
title = {Mizo Automatic Speech Recognition: Leveraging Wav2vec 2.0 and XLS-R for Enhanced Accuracy in Low-Resource Language Processing},
year = {2025},
url = {https://doi.org/10.1145/3746063},
doi = {10.1145/3746063},
journal = {ACM Trans. Asian Low-Resour. Lang. Inf. Process.},
month = jun,
}

Training and evaluation data

MiZonal v1.0

The following hyperparameters were used during training:

learning_rate: 0.0003
train_batch_size: 16
eval_batch_size: 8
seed: 49
gradient_accumulation_steps: 4
total_train_batch_size: 64
optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
lr_scheduler_type: linear
lr_scheduler_warmup_steps: 1000
num_epochs: 28
mixed_precision_training: Native AMP

Training results

Training Loss	Epoch	Step	Validation Loss	Wer
No log	0.72	100	2.9278	1.0
3.6344	1.45	200	2.8377	1.0
3.6344	2.17	300	2.0445	0.9965
2.1941	2.9	400	0.9115	0.7317
2.1941	3.62	500	0.6427	0.5815
1.0173	4.35	600	0.5384	0.5008
1.0173	5.07	700	0.4707	0.4641
0.7632	5.8	800	0.3804	0.4103
0.7632	6.52	900	0.3635	0.3750
0.6463	7.25	1000	0.3351	0.3670
0.6463	7.97	1100	0.2953	0.3336
0.5674	8.7	1200	0.2711	0.3065
0.5674	9.42	1300	0.2527	0.2877
0.4916	10.14	1400	0.2403	0.2823
0.4916	10.87	1500	0.2352	0.2717
0.442	11.59	1600	0.2312	0.2639
0.442	12.32	1700	0.2251	0.2517
0.4056	13.04	1800	0.1932	0.2275
0.4056	13.77	1900	0.2013	0.2294
0.3726	14.49	2000	0.1954	0.2226
0.3726	15.22	2100	0.1957	0.2175
0.3426	15.94	2200	0.2045	0.2107
0.3426	16.67	2300	0.2003	0.2127
0.3275	17.39	2400	0.1933	0.2023
0.3275	18.12	2500	0.1859	0.2006
0.3112	18.84	2600	0.1821	0.1909
0.3112	19.57	2700	0.1756	0.1888
0.293	20.29	2800	0.1761	0.1865
0.293	21.01	2900	0.1748	0.1990
0.2684	21.74	3000	0.1694	0.1788
0.2684	22.46	3100	0.1745	0.1778
0.2502	23.19	3200	0.1726	0.1739
0.2502	23.91	3300	0.1699	0.1708
0.2435	24.64	3400	0.1670	0.1701
0.2435	25.36	3500	0.1616	0.1714
0.2337	26.09	3600	0.1609	0.1689
0.2337	26.81	3700	0.1615	0.1680
0.2266	27.54	3800	0.1614	0.1659

Framework versions

Transformers 4.37.2
Pytorch 2.3.1+cu121
Datasets 2.16.1
Tokenizers 0.15.1

andrewbawitlung
/

wav2vec2-base-mizo-lus-v25

Mizo Automatic Speech Recognition

Citation

Training and evaluation data

Training results

Framework versions

Model tree for andrewbawitlung/wav2vec2-base-mizo-lus-v25

Space using andrewbawitlung/wav2vec2-base-mizo-lus-v25 1

Collection including andrewbawitlung/wav2vec2-base-mizo-lus-v25

Mizo Automatic Speech Recognition (ASR) Models

Evaluation results