Whisper base Vi - Nam Phung

This model is a fine-tuned version of openai/whisper-base on the vlsp2020_vinai_100h dataset. It achieves the following results on the evaluation set:

  • Loss: 0.3606
  • Wer: 16.9148

Model description

More information needed

Intended uses & limitations

More information needed

Training and evaluation data

More information needed

Training procedure

Training hyperparameters

The following hyperparameters were used during training:

  • learning_rate: 1e-05
  • train_batch_size: 16
  • eval_batch_size: 8
  • seed: 42
  • optimizer: Use OptimizerNames.ADAMW_TORCH with betas=(0.9,0.999) and epsilon=1e-08 and optimizer_args=No additional optimizer arguments
  • lr_scheduler_type: linear
  • lr_scheduler_warmup_steps: 500
  • training_steps: 15000
  • mixed_precision_training: Native AMP

Training results

Training Loss Epoch Step Validation Loss Wer
0.7975 0.0886 250 0.7610 36.8155
0.6074 0.1772 500 0.6467 32.4870
0.5934 0.2658 750 0.5843 29.5521
0.5497 0.3544 1000 0.5450 26.5531
0.5559 0.4429 1250 0.5176 26.0146
0.4872 0.5315 1500 0.4967 25.8677
0.5001 0.6201 1750 0.4795 25.0705
0.4597 0.7087 2000 0.4644 24.5844
0.4507 0.7973 2250 0.4536 22.6308
0.4356 0.8859 2500 0.4412 22.1019
0.4589 0.9745 2750 0.4315 22.3294
0.3347 1.0631 3000 0.4250 21.2764
0.3318 1.1517 3250 0.4204 20.9716
0.3473 1.2403 3500 0.4134 20.9027
0.3358 1.3288 3750 0.4097 20.2717
0.3467 1.4174 4000 0.4034 20.3648
0.3325 1.5060 4250 0.3987 19.7828
0.3396 1.5946 4500 0.3938 20.0876
0.3429 1.6832 4750 0.3897 18.9360
0.3347 1.7718 5000 0.3852 19.5118
0.3318 1.8604 5250 0.3816 19.1070
0.3362 1.9490 5500 0.3765 19.3152
0.3083 2.0376 5750 0.3780 18.7174
0.2372 2.1262 6000 0.3779 18.7188
0.2534 2.2147 6250 0.3742 18.6181
0.271 2.3033 6500 0.3729 18.5588
0.2836 2.3919 6750 0.3718 18.3712
0.2648 2.4805 7000 0.3689 18.3843
0.2678 2.5691 7250 0.3665 17.6009
0.2714 2.6577 7500 0.3652 17.7202
0.2504 2.7463 7750 0.3640 17.9457
0.275 2.8349 8000 0.3631 17.7382
0.2538 2.9235 8250 0.3598 17.3451
0.1795 3.0120 8500 0.3612 17.2499
0.1879 3.1006 8750 0.3648 17.5003
0.1947 3.1892 9000 0.3627 17.2665
0.1968 3.2778 9250 0.3620 17.0700
0.1954 3.3664 9500 0.3621 17.1148
0.1921 3.4550 9750 0.3617 17.0251
0.2068 3.5436 10000 0.3601 17.2162
0.2115 3.6322 10250 0.3604 17.0293
0.2242 3.7208 10500 0.3591 16.8072
0.2015 3.8094 10750 0.3574 17.0858
0.2261 3.8979 11000 0.3573 16.7017
0.2129 3.9865 11250 0.3556 17.1631
0.1739 4.0751 11500 0.3603 16.8362
0.1532 4.1637 11750 0.3603 16.8603
0.1408 4.2523 12000 0.3613 16.8631
0.1743 4.3409 12250 0.3604 16.8196
0.1832 4.4295 12500 0.3613 16.9534
0.1688 4.5181 12750 0.3609 17.0279
0.1767 4.6067 13000 0.3595 17.1865
0.1589 4.6953 13250 0.3596 16.8824
0.1778 4.7838 13500 0.3591 16.8376
0.1806 4.8724 13750 0.3590 16.8714
0.1551 4.9610 14000 0.3591 16.8231
0.163 5.0496 14250 0.3598 16.9541
0.1365 5.1382 14500 0.3604 16.8079
0.1563 5.2268 14750 0.3606 16.9176
0.1429 5.3154 15000 0.3606 16.9148

Framework versions

  • Transformers 4.51.3
  • Pytorch 2.6.0+cu124
  • Datasets 3.5.0
  • Tokenizers 0.21.1
Downloads last month
24
Safetensors
Model size
72.6M params
Tensor type
F32
·
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for namph204/whisper-base-vi

Finetuned
(466)
this model

Dataset used to train namph204/whisper-base-vi