--- library_name: transformers license: apache-2.0 base_model: openai/whisper-large-v3 tags: - automatic-speech-recognition - whisper - urdu - mozilla-foundation/common_voice_17_0 - hf-asr-leaderboard datasets: - mozilla-foundation/common_voice_17_0 metrics: - wer - cer - bleu - chrf model-index: - name: whisper-large-v3-urdu results: - task: type: automatic-speech-recognition name: Automatic Speech Recognition dataset: name: Common Voice 17.0 (Urdu) type: mozilla-foundation/common_voice_17_0 config: ur split: test args: ur metrics: - type: wer value: 26.019 name: WER - type: cer value: 9.426 name: CER - type: bleu value: 59.446 name: BLEU - type: chrf value: 82.902 name: ChrF language: - ur pipeline_tag: automatic-speech-recognition --- # Whisper large V3 Urdu ASR Model 🥇 This model is a fine-tuned version of [openai/whisper-large-v3](https://huggingface.co/openai/whisper-large-v3) on the common_voice_17_0 dataset. It achieves the following results on the evaluation set: - Loss: 0.0204 - Wer: 21.4712 - Cer: 7.1975 ## Quick Usage ```python from transformers import pipeline transcriber = pipeline( "automatic-speech-recognition", model="kingabzpro/whisper-large-v3-turbo-urdu" ) transcriber.model.generation_config.forced_decoder_ids = None transcriber.model.generation_config.language = "ur" transcription = transcriber("audio2.mp3") print(transcription) ``` ```sh {'text': 'دیکھیے پانی کب تک بہتا اور مچھلی کب تک تیرتی ہے'} ``` ### Training hyperparameters The following hyperparameters were used during training: - learning_rate: 3e-05 - train_batch_size: 8 - eval_batch_size: 4 - seed: 42 - gradient_accumulation_steps: 2 - total_train_batch_size: 16 - optimizer: Use adamw_torch with betas=(0.9,0.999) and epsilon=1e-08 and optimizer_args=No additional optimizer arguments - lr_scheduler_type: cosine - lr_scheduler_warmup_steps: 100 - training_steps: 1500 ### Training results | Training Loss | Epoch | Step | Validation Loss | Wer | Cer | |:-------------:|:------:|:----:|:---------------:|:-------:|:-------:| | 0.0261 | 0.5089 | 300 | 0.0254 | 30.0224 | 10.3646 | | 0.0211 | 1.0170 | 600 | 0.0226 | 25.8588 | 8.5780 | | 0.0121 | 1.5259 | 900 | 0.0206 | 24.2158 | 7.9412 | | 0.0093 | 2.0339 | 1200 | 0.0195 | 21.3032 | 7.2018 | | 0.0043 | 2.5428 | 1500 | 0.0204 | 21.4712 | 7.1975 | ### Framework versions - Transformers 4.52.2 - Pytorch 2.7.1+cu126 - Datasets 3.4.1 - Tokenizers 0.21.2 --- ## Evaluation Urdu ASR Evaluation on Common Voice 17.0 (Test Split). | Metric | Value | Description | |--------|----------|------------------------------------| | **WER** | 26.019% | Word Error Rate (lower is better) | | **CER** | 9.426% | Character Error Rate | | **BLEU** | 59.446% | BLEU Score (higher is better) | | **ChrF** | 82.902 | Character n-gram F-score | >👉 Review the testing script: [Testing Whisper Large V3 Urdu](https://www.kaggle.com/code/kingabzpro/testing-urdu-whisper-large-v3)