File size: 3,449 Bytes

daec0b1
50a2107
7e0a3e6
50a2107
 
c89ef01
 
 
 
 
50a2107
c89ef01
50a2107
 
c89ef01
 
 
50a2107
 
 
 
 
369eeb9
50a2107
c89ef01
 
50a2107
c89ef01
50a2107
 
369eeb9
1183a8c
c89ef01
 
1183a8c
c89ef01
 
1183a8c
c89ef01
 
1183a8c
c89ef01
 
 
 
daec0b1
 
50a2107
 
 
c89ef01
50a2107
 
 
 
 
 
 
 
c89ef01
50a2107
c89ef01
 
50a2107
c89ef01
 
 
 
50a2107
c89ef01
 
50a2107
c89ef01
 
 
 
 
 
 
50a2107
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
daec0b1
 
50a2107
daec0b1
50a2107
 
 
 
c89ef01
 
 
 
 
 
 
 
 
1183a8c
 
 
 
c89ef01
1183a8c
c89ef01

---
library_name: transformers
license: apache-2.0
base_model: openai/whisper-large-v3
tags:
- automatic-speech-recognition
- whisper
- urdu
- mozilla-foundation/common_voice_17_0
- hf-asr-leaderboard
datasets:
- mozilla-foundation/common_voice_17_0
metrics:
- wer
- cer
- bleu
- chrf
model-index:
- name: whisper-large-v3-urdu
  results:
  - task:
      type: automatic-speech-recognition
      name: Automatic Speech Recognition
    dataset:
      name: Common Voice 17.0 (Urdu)
      type: mozilla-foundation/common_voice_17_0
      config: ur
      split: test
      args: ur
    metrics:
    - type: wer
      value: 26.019
      name: WER
    - type: cer
      value: 9.426
      name: CER
    - type: bleu
      value: 59.446
      name: BLEU
    - type: chrf
      value: 82.902
      name: ChrF
language:
- ur
pipeline_tag: automatic-speech-recognition
---

<!-- This model card has been generated automatically according to the information the Trainer had access to. You
should probably proofread and complete it, then remove this comment. -->

# Whisper large V3 Urdu ASR Model 🥇

This model is a fine-tuned version of [openai/whisper-large-v3](https://huggingface.co/openai/whisper-large-v3) on the common_voice_17_0 dataset.
It achieves the following results on the evaluation set:
- Loss: 0.0204
- Wer: 21.4712
- Cer: 7.1975


## Quick Usage

```python
from transformers import pipeline

transcriber = pipeline(
  "automatic-speech-recognition", 
  model="kingabzpro/whisper-large-v3-turbo-urdu"
)

transcriber.model.generation_config.forced_decoder_ids = None
transcriber.model.generation_config.language = "ur"

transcription = transcriber("audio2.mp3")
print(transcription)
```

```sh
{'text': 'دیکھیے پانی کب تک بہتا اور مچھلی کب تک تیرتی ہے'}
```


### Training hyperparameters

The following hyperparameters were used during training:
- learning_rate: 3e-05
- train_batch_size: 8
- eval_batch_size: 4
- seed: 42
- gradient_accumulation_steps: 2
- total_train_batch_size: 16
- optimizer: Use adamw_torch with betas=(0.9,0.999) and epsilon=1e-08 and optimizer_args=No additional optimizer arguments
- lr_scheduler_type: cosine
- lr_scheduler_warmup_steps: 100
- training_steps: 1500

### Training results

| Training Loss | Epoch  | Step | Validation Loss | Wer     | Cer     |
|:-------------:|:------:|:----:|:---------------:|:-------:|:-------:|
| 0.0261        | 0.5089 | 300  | 0.0254          | 30.0224 | 10.3646 |
| 0.0211        | 1.0170 | 600  | 0.0226          | 25.8588 | 8.5780  |
| 0.0121        | 1.5259 | 900  | 0.0206          | 24.2158 | 7.9412  |
| 0.0093        | 2.0339 | 1200 | 0.0195          | 21.3032 | 7.2018  |
| 0.0043        | 2.5428 | 1500 | 0.0204          | 21.4712 | 7.1975  |


### Framework versions

- Transformers 4.52.2
- Pytorch 2.7.1+cu126
- Datasets 3.4.1
- Tokenizers 0.21.2

---

## Evaluation

Urdu ASR Evaluation on Common Voice 17.0 (Test Split). 

| Metric | Value    | Description                        |
|--------|----------|------------------------------------|
| **WER**   | 26.019%  | Word Error Rate (lower is better) |
| **CER**   | 9.426%   | Character Error Rate              |
| **BLEU**  | 59.446% | BLEU Score (higher is better)     |
| **ChrF**  | 82.902 | Character n-gram F-score          |

>👉 Review the testing script: [Testing Whisper Large V3 Urdu](https://www.kaggle.com/code/kingabzpro/testing-urdu-whisper-large-v3)