QuanHoangNgoc's picture
End of training
2e28d99 verified
metadata
library_name: transformers
language:
  - vi
license: apache-2.0
base_model: facebook/wav2vec2-large-xlsr-53
tags:
  - speech-to-text
  - vietnamese
  - uit-vimd
  - generated_from_trainer
datasets:
  - uit-vimd
metrics:
  - wer
model-index:
  - name: wav2vec2-large-xlsr-53_030909
    results:
      - task:
          name: Automatic Speech Recognition
          type: automatic-speech-recognition
        dataset:
          name: UIT-ViMD
          type: uit-vimd
        metrics:
          - name: Wer
            type: wer
            value: 0.9996964638033086

wav2vec2-large-xlsr-53_030909

This model is a fine-tuned version of facebook/wav2vec2-large-xlsr-53 on the UIT-ViMD dataset. It achieves the following results on the evaluation set:

  • Loss: 3.3345
  • Wer: 0.9997

Model description

More information needed

Intended uses & limitations

More information needed

Training and evaluation data

More information needed

Training procedure

Training hyperparameters

The following hyperparameters were used during training:

  • learning_rate: 0.0003
  • train_batch_size: 8
  • eval_batch_size: 8
  • seed: 42
  • gradient_accumulation_steps: 4
  • total_train_batch_size: 32
  • optimizer: Use OptimizerNames.ADAMW_TORCH with betas=(0.9,0.999) and epsilon=1e-08 and optimizer_args=No additional optimizer arguments
  • lr_scheduler_type: linear
  • lr_scheduler_warmup_ratio: 0.1
  • num_epochs: 30
  • mixed_precision_training: Native AMP

Training results

Training Loss Epoch Step Validation Loss Wer
13.8904 0.9231 3 12.5105 0.9998
10.5144 1.6154 6 12.2960 1.0
9.314 2.3077 9 9.0572 0.9997
7.7092 3.0 12 6.5385 0.9997
7.2508 3.9231 15 5.3754 0.9997
4.2192 4.6154 18 4.6707 0.9997
3.7228 5.3077 21 4.2868 0.9997
3.4506 6.0 24 4.0406 0.9997
4.1163 6.9231 27 3.8480 0.9997
2.8937 7.6154 30 3.6930 0.9997
2.8004 8.3077 33 3.5770 0.9997
2.6908 9.0 36 3.4962 0.9997
3.5181 9.9231 39 3.4494 0.9997
2.6012 10.6154 42 3.4272 0.9997
2.5532 11.3077 45 3.3969 0.9997
2.5429 12.0 48 3.3711 0.9997
3.3727 12.9231 51 3.3724 0.9997
2.5283 13.6154 54 3.3591 0.9997
2.5149 14.3077 57 3.3542 0.9997
2.5217 15.0 60 3.3539 0.9997
3.3412 15.9231 63 3.3400 0.9997
2.5332 16.6154 66 3.3409 0.9998
2.4927 17.3077 69 3.3521 0.9997
2.5114 18.0 72 3.3541 0.9997
3.3431 18.9231 75 3.3576 0.9997
2.5085 19.6154 78 3.3484 0.9997
2.5107 20.3077 81 3.3388 0.9997
2.4972 21.0 84 3.3349 0.9998
3.3369 21.9231 87 3.3343 0.9997
2.5042 22.6154 90 3.3345 0.9997

Framework versions

  • Transformers 4.48.3
  • Pytorch 2.6.0+cu124
  • Datasets 3.3.2
  • Tokenizers 0.21.0