metadata
library_name: transformers
license: apache-2.0
base_model: openai/whisper-large-v3
tags:
- automatic-speech-recognition
- whisper
- urdu
- mozilla-foundation/common_voice_17_0
- hf-asr-leaderboard
datasets:
- mozilla-foundation/common_voice_17_0
metrics:
- wer
- cer
- bleu
- chrf
model-index:
- name: whisper-large-v3-urdu
results:
- task:
type: automatic-speech-recognition
name: Automatic Speech Recognition
dataset:
name: Common Voice 17.0 (Urdu)
type: mozilla-foundation/common_voice_17_0
config: ur
split: test
args: ur
metrics:
- type: wer
value: 26.019
name: WER
- type: cer
value: 9.426
name: CER
- type: bleu
value: 59.446
name: BLEU
- type: chrf
value: 82.902
name: ChrF
language:
- ur
pipeline_tag: automatic-speech-recognition
Whisper large V3 Urdu ASR Model 🥇
This model is a fine-tuned version of openai/whisper-large-v3 on the common_voice_17_0 dataset. It achieves the following results on the evaluation set:
- Loss: 0.0204
- Wer: 21.4712
- Cer: 7.1975
Quick Usage
from transformers import pipeline
transcriber = pipeline(
"automatic-speech-recognition",
model="kingabzpro/whisper-large-v3-turbo-urdu"
)
transcriber.model.generation_config.forced_decoder_ids = None
transcriber.model.generation_config.language = "ur"
transcription = transcriber("audio2.mp3")
print(transcription)
{'text': 'دیکھیے پانی کب تک بہتا اور مچھلی کب تک تیرتی ہے'}
Training hyperparameters
The following hyperparameters were used during training:
- learning_rate: 3e-05
- train_batch_size: 8
- eval_batch_size: 4
- seed: 42
- gradient_accumulation_steps: 2
- total_train_batch_size: 16
- optimizer: Use adamw_torch with betas=(0.9,0.999) and epsilon=1e-08 and optimizer_args=No additional optimizer arguments
- lr_scheduler_type: cosine
- lr_scheduler_warmup_steps: 100
- training_steps: 1500
Training results
Training Loss | Epoch | Step | Validation Loss | Wer | Cer |
---|---|---|---|---|---|
0.0261 | 0.5089 | 300 | 0.0254 | 30.0224 | 10.3646 |
0.0211 | 1.0170 | 600 | 0.0226 | 25.8588 | 8.5780 |
0.0121 | 1.5259 | 900 | 0.0206 | 24.2158 | 7.9412 |
0.0093 | 2.0339 | 1200 | 0.0195 | 21.3032 | 7.2018 |
0.0043 | 2.5428 | 1500 | 0.0204 | 21.4712 | 7.1975 |
Framework versions
- Transformers 4.52.2
- Pytorch 2.7.1+cu126
- Datasets 3.4.1
- Tokenizers 0.21.2
Evaluation
Urdu ASR Evaluation on Common Voice 17.0 (Test Split).
Metric | Value | Description |
---|---|---|
WER | 26.019% | Word Error Rate (lower is better) |
CER | 9.426% | Character Error Rate |
BLEU | 59.446% | BLEU Score (higher is better) |
ChrF | 82.902 | Character n-gram F-score |
👉 Review the testing script: Testing Whisper Large V3 Urdu