Mizo Automatic Speech Recognition

This model is a fine-tuned version of facebook/wav2vec2-base on the generator dataset. It achieves the following results on the evaluation set:

  • Loss: 0.1614
  • Wer: 0.1659

Citation

BibTeX entry and citation info:

@article{10.1145/3746063,
author = {Bawitlung, Andrew and Dash, Sandeep Kumar and Pattanayak, Radha Mohan},
title = {Mizo Automatic Speech Recognition: Leveraging Wav2vec 2.0 and XLS-R for Enhanced Accuracy in Low-Resource Language Processing},
year = {2025},
url = {https://doi.org/10.1145/3746063},
doi = {10.1145/3746063},
journal = {ACM Trans. Asian Low-Resour. Lang. Inf. Process.},
month = jun,
}

Training and evaluation data

MiZonal v1.0

The following hyperparameters were used during training:

  • learning_rate: 0.0003
  • train_batch_size: 16
  • eval_batch_size: 8
  • seed: 49
  • gradient_accumulation_steps: 4
  • total_train_batch_size: 64
  • optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
  • lr_scheduler_type: linear
  • lr_scheduler_warmup_steps: 1000
  • num_epochs: 28
  • mixed_precision_training: Native AMP

Training results

Training Loss Epoch Step Validation Loss Wer
No log 0.72 100 2.9278 1.0
3.6344 1.45 200 2.8377 1.0
3.6344 2.17 300 2.0445 0.9965
2.1941 2.9 400 0.9115 0.7317
2.1941 3.62 500 0.6427 0.5815
1.0173 4.35 600 0.5384 0.5008
1.0173 5.07 700 0.4707 0.4641
0.7632 5.8 800 0.3804 0.4103
0.7632 6.52 900 0.3635 0.3750
0.6463 7.25 1000 0.3351 0.3670
0.6463 7.97 1100 0.2953 0.3336
0.5674 8.7 1200 0.2711 0.3065
0.5674 9.42 1300 0.2527 0.2877
0.4916 10.14 1400 0.2403 0.2823
0.4916 10.87 1500 0.2352 0.2717
0.442 11.59 1600 0.2312 0.2639
0.442 12.32 1700 0.2251 0.2517
0.4056 13.04 1800 0.1932 0.2275
0.4056 13.77 1900 0.2013 0.2294
0.3726 14.49 2000 0.1954 0.2226
0.3726 15.22 2100 0.1957 0.2175
0.3426 15.94 2200 0.2045 0.2107
0.3426 16.67 2300 0.2003 0.2127
0.3275 17.39 2400 0.1933 0.2023
0.3275 18.12 2500 0.1859 0.2006
0.3112 18.84 2600 0.1821 0.1909
0.3112 19.57 2700 0.1756 0.1888
0.293 20.29 2800 0.1761 0.1865
0.293 21.01 2900 0.1748 0.1990
0.2684 21.74 3000 0.1694 0.1788
0.2684 22.46 3100 0.1745 0.1778
0.2502 23.19 3200 0.1726 0.1739
0.2502 23.91 3300 0.1699 0.1708
0.2435 24.64 3400 0.1670 0.1701
0.2435 25.36 3500 0.1616 0.1714
0.2337 26.09 3600 0.1609 0.1689
0.2337 26.81 3700 0.1615 0.1680
0.2266 27.54 3800 0.1614 0.1659

Framework versions

  • Transformers 4.37.2
  • Pytorch 2.3.1+cu121
  • Datasets 2.16.1
  • Tokenizers 0.15.1
Downloads last month
8
Safetensors
Model size
94.4M params
Tensor type
F32
ยท
Inference Providers NEW
This model isn't deployed by any Inference Provider. ๐Ÿ™‹ Ask for provider support

Model tree for andrewbawitlung/wav2vec2-base-mizo-lus-v25

Finetuned
(789)
this model

Space using andrewbawitlung/wav2vec2-base-mizo-lus-v25 1

Collection including andrewbawitlung/wav2vec2-base-mizo-lus-v25

Evaluation results