---
library_name: transformers
license: apache-2.0
base_model: openai/whisper-large-v3
tags:
- automatic-speech-recognition
- whisper
- urdu
- mozilla-foundation/common_voice_17_0
- hf-asr-leaderboard
datasets:
- mozilla-foundation/common_voice_17_0
metrics:
- wer
- cer
- bleu
- chrf
model-index:
- name: whisper-large-v3-urdu
  results:
  - task:
      type: automatic-speech-recognition
      name: Automatic Speech Recognition
    dataset:
      name: Common Voice 17.0 (Urdu)
      type: mozilla-foundation/common_voice_17_0
      config: ur
      split: test
      args: ur
    metrics:
    - type: wer
      value: 26.019
      name: WER
    - type: cer
      value: 9.426
      name: CER
    - type: bleu
      value: 59.446
      name: BLEU
    - type: chrf
      value: 82.902
      name: ChrF
language:
- ur
pipeline_tag: automatic-speech-recognition
---

<!-- This model card has been generated automatically according to the information the Trainer had access to. You
should probably proofread and complete it, then remove this comment. -->

# Whisper large V3 Urdu ASR Model 🥇

This model is a fine-tuned version of [openai/whisper-large-v3](https://huggingface.co/openai/whisper-large-v3) on the common_voice_17_0 dataset.
It achieves the following results on the evaluation set:
- Loss: 0.0204
- Wer: 21.4712
- Cer: 7.1975


## Quick Usage

```python
from transformers import pipeline

transcriber = pipeline(
  "automatic-speech-recognition", 
  model="kingabzpro/whisper-large-v3-turbo-urdu"
)

transcriber.model.generation_config.forced_decoder_ids = None
transcriber.model.generation_config.language = "ur"

transcription = transcriber("audio2.mp3")
print(transcription)
```

```sh
{'text': 'دیکھیے پانی کب تک بہتا اور مچھلی کب تک تیرتی ہے'}
```


### Training hyperparameters

The following hyperparameters were used during training:
- learning_rate: 3e-05
- train_batch_size: 8
- eval_batch_size: 4
- seed: 42
- gradient_accumulation_steps: 2
- total_train_batch_size: 16
- optimizer: Use adamw_torch with betas=(0.9,0.999) and epsilon=1e-08 and optimizer_args=No additional optimizer arguments
- lr_scheduler_type: cosine
- lr_scheduler_warmup_steps: 100
- training_steps: 1500

### Training results

| Training Loss | Epoch  | Step | Validation Loss | Wer     | Cer     |
|:-------------:|:------:|:----:|:---------------:|:-------:|:-------:|
| 0.0261        | 0.5089 | 300  | 0.0254          | 30.0224 | 10.3646 |
| 0.0211        | 1.0170 | 600  | 0.0226          | 25.8588 | 8.5780  |
| 0.0121        | 1.5259 | 900  | 0.0206          | 24.2158 | 7.9412  |
| 0.0093        | 2.0339 | 1200 | 0.0195          | 21.3032 | 7.2018  |
| 0.0043        | 2.5428 | 1500 | 0.0204          | 21.4712 | 7.1975  |


### Framework versions

- Transformers 4.52.2
- Pytorch 2.7.1+cu126
- Datasets 3.4.1
- Tokenizers 0.21.2

---

## Evaluation

Urdu ASR Evaluation on Common Voice 17.0 (Test Split). 

| Metric | Value    | Description                        |
|--------|----------|------------------------------------|
| **WER**   | 26.019%  | Word Error Rate (lower is better) |
| **CER**   | 9.426%   | Character Error Rate              |
| **BLEU**  | 59.446% | BLEU Score (higher is better)     |
| **ChrF**  | 82.902 | Character n-gram F-score          |

>👉 Review the testing script: [Testing Whisper Large V3 Urdu](https://www.kaggle.com/code/kingabzpro/testing-urdu-whisper-large-v3)