File size: 3,449 Bytes
daec0b1 50a2107 7e0a3e6 50a2107 c89ef01 50a2107 c89ef01 50a2107 c89ef01 50a2107 369eeb9 50a2107 c89ef01 50a2107 c89ef01 50a2107 369eeb9 1183a8c c89ef01 1183a8c c89ef01 1183a8c c89ef01 1183a8c c89ef01 daec0b1 50a2107 c89ef01 50a2107 c89ef01 50a2107 c89ef01 50a2107 c89ef01 50a2107 c89ef01 50a2107 c89ef01 50a2107 daec0b1 50a2107 daec0b1 50a2107 c89ef01 1183a8c c89ef01 1183a8c c89ef01 |
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 |
---
library_name: transformers
license: apache-2.0
base_model: openai/whisper-large-v3
tags:
- automatic-speech-recognition
- whisper
- urdu
- mozilla-foundation/common_voice_17_0
- hf-asr-leaderboard
datasets:
- mozilla-foundation/common_voice_17_0
metrics:
- wer
- cer
- bleu
- chrf
model-index:
- name: whisper-large-v3-urdu
results:
- task:
type: automatic-speech-recognition
name: Automatic Speech Recognition
dataset:
name: Common Voice 17.0 (Urdu)
type: mozilla-foundation/common_voice_17_0
config: ur
split: test
args: ur
metrics:
- type: wer
value: 26.019
name: WER
- type: cer
value: 9.426
name: CER
- type: bleu
value: 59.446
name: BLEU
- type: chrf
value: 82.902
name: ChrF
language:
- ur
pipeline_tag: automatic-speech-recognition
---
<!-- This model card has been generated automatically according to the information the Trainer had access to. You
should probably proofread and complete it, then remove this comment. -->
# Whisper large V3 Urdu ASR Model 🥇
This model is a fine-tuned version of [openai/whisper-large-v3](https://huggingface.co/openai/whisper-large-v3) on the common_voice_17_0 dataset.
It achieves the following results on the evaluation set:
- Loss: 0.0204
- Wer: 21.4712
- Cer: 7.1975
## Quick Usage
```python
from transformers import pipeline
transcriber = pipeline(
"automatic-speech-recognition",
model="kingabzpro/whisper-large-v3-turbo-urdu"
)
transcriber.model.generation_config.forced_decoder_ids = None
transcriber.model.generation_config.language = "ur"
transcription = transcriber("audio2.mp3")
print(transcription)
```
```sh
{'text': 'دیکھیے پانی کب تک بہتا اور مچھلی کب تک تیرتی ہے'}
```
### Training hyperparameters
The following hyperparameters were used during training:
- learning_rate: 3e-05
- train_batch_size: 8
- eval_batch_size: 4
- seed: 42
- gradient_accumulation_steps: 2
- total_train_batch_size: 16
- optimizer: Use adamw_torch with betas=(0.9,0.999) and epsilon=1e-08 and optimizer_args=No additional optimizer arguments
- lr_scheduler_type: cosine
- lr_scheduler_warmup_steps: 100
- training_steps: 1500
### Training results
| Training Loss | Epoch | Step | Validation Loss | Wer | Cer |
|:-------------:|:------:|:----:|:---------------:|:-------:|:-------:|
| 0.0261 | 0.5089 | 300 | 0.0254 | 30.0224 | 10.3646 |
| 0.0211 | 1.0170 | 600 | 0.0226 | 25.8588 | 8.5780 |
| 0.0121 | 1.5259 | 900 | 0.0206 | 24.2158 | 7.9412 |
| 0.0093 | 2.0339 | 1200 | 0.0195 | 21.3032 | 7.2018 |
| 0.0043 | 2.5428 | 1500 | 0.0204 | 21.4712 | 7.1975 |
### Framework versions
- Transformers 4.52.2
- Pytorch 2.7.1+cu126
- Datasets 3.4.1
- Tokenizers 0.21.2
---
## Evaluation
Urdu ASR Evaluation on Common Voice 17.0 (Test Split).
| Metric | Value | Description |
|--------|----------|------------------------------------|
| **WER** | 26.019% | Word Error Rate (lower is better) |
| **CER** | 9.426% | Character Error Rate |
| **BLEU** | 59.446% | BLEU Score (higher is better) |
| **ChrF** | 82.902 | Character n-gram F-score |
>👉 Review the testing script: [Testing Whisper Large V3 Urdu](https://www.kaggle.com/code/kingabzpro/testing-urdu-whisper-large-v3)
|