Whisper Small Hy - Erik Mkrtchyan

This model is a fine-tuned version of openai/whisper-small on the Hy Generated Audio Data with CV 20.0 dataset. It achieves the following results on the evaluation set:

  • Loss: 0.1185
  • Wer: 26.9720

Model description

This model is based on OpenAI's Whisper Small and fine-tuned for Armenian using a combination of real and synthetic audio data. It is designed to transcribe Armenian speech into text.

Intended uses & limitations

Intended Uses:

  • Armenian speech-to-text applications
  • Research on ASR for low-resource languages
  • Educational and experimental projects involving Whisper models

Limitations:

  • May not generalize well to accents or noisy audio not represented in the training set
  • he model may hallucinate text or produce inaccurate transcriptions, especially on unusual or out-of-distribution inputs, due to the inclusion of TTS-generated synthetic data in training.

Training and evaluation data

The dataset contains both real and high-quality synthetic Armenian speech clips.

Split # Clips Duration (hours)
train 9,300 13.53
test 5,818 9.16
eval 5,856 8.76
generated 100,000 113.61

Total duration: ~145 hours
Train set duration(train+generated): ~127 hours
Test set duration(test+eval) ~18 hours

Training procedure

Training hyperparameters

The following hyperparameters were used during training:

  • learning_rate: 1e-05
  • train_batch_size: 16
  • eval_batch_size: 8
  • seed: 42
  • optimizer: Use adamw_torch with betas=(0.9,0.999) and epsilon=1e-08 and optimizer_args=No additional optimizer arguments
  • lr_scheduler_type: linear
  • lr_scheduler_warmup_steps: 500
  • training_steps: 15000
  • mixed_precision_training: Native AMP

Training results

Training Loss Epoch Step Validation Loss Wer
0.1118 0.1464 1000 0.2371 48.5991
0.0959 0.2927 2000 0.1895 41.1675
0.0862 0.4391 3000 0.1716 38.6837
0.0741 0.5855 4000 0.1572 35.3540
0.0708 0.7319 5000 0.1443 33.0242
0.0558 0.8782 6000 0.1352 31.4380
0.0467 1.0246 7000 0.1315 30.2390
0.0528 1.1710 8000 0.1295 29.9233
0.0455 1.3173 9000 0.1280 29.2490
0.0347 1.4637 10000 0.1246 28.9718
0.049 1.6101 11000 0.1221 28.5274
0.0419 1.7564 12000 0.1189 27.9543
0.0371 1.9028 13000 0.1166 27.5242
0.0286 2.0492 14000 0.1173 27.0149
0.0301 2.1956 15000 0.1185 26.9720

Framework versions

  • Transformers 4.51.3
  • Pytorch 2.7.0+cu126
  • Datasets 3.6.0
  • Tokenizers 0.21.1
Downloads last month
3
Safetensors
Model size
242M params
Tensor type
F32
·
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for ErikMkrtchyan/whisper-small-hy

Finetuned
(2787)
this model

Dataset used to train ErikMkrtchyan/whisper-small-hy

Evaluation results