--- library_name: peft datasets: - mozilla-foundation/common_voice_17_0 language: - bn base_model: - openai/whisper-base license: apache-2.0 metrics: - wer pipeline_tag: automatic-speech-recognition model-index: - name: Whisper Base Bn LoRA Adapter - BanglaBridge results: - task: name: Automatic Speech Recognition type: automatic-speech-recognition dataset: name: Common Voice 17.0 type: mozilla-foundation/common_voice_17_0 config: bn split: None args: 'config: bn, split: test' metrics: - name: Wer type: wer value: 22.56397 --- # Whisper Base Bn LoRA Adapter - by BanglaBridge This model is a PEFT LoRA fine-tuned version of [openai/whisper-base](https://huggingface.co/openai/whisper-base) on the Common Voice 17.0 dataset. It achieves the following results on the test set: - Wer: 44.93734 - Normalized Wer: 22.56397 ### Training hyperparameters The following hyperparameters were used during training: - learning_rate: 1e-03 - train_batch_size: 32 - eval_batch_size: 32 - warmup_steps: 500 - training_steps: 20000 LoraConfig: - r: 32 - lora_alpha: 64 - target_modules: `["q_proj", "v_proj"]` - lora_dropout: 0.005 - bias: none ### Training results | Step | Training Loss | Validation Loss | |:------:|:-------------:|:---------------:| | 1000 | 0.240200 | 0.251211 | | 2000 | 0.178700 | 0.210411 | | 3000 | 0.150000 | 0.193197 | | 4000 | 0.122500 | 0.184060 | | 5000 | 0.122300 | 0.177079 | | 6000 | 0.097100 | 0.181073 | | 7000 | 0.095800 | 0.175566 | | 8000 | 0.071400 | 0.173997 | | 9000 | 0.082600 | 0.175677 | | 10000 | 0.064400 | 0.178262 | | 11000 | 0.064700 | 0.177943 | | 12000 | 0.046900 | 0.185763 | | 13000 | 0.047200 | 0.186843 | | 14000 | 0.037500 | 0.193575 | | 15000 | 0.036000 | 0.199084 | | 16000 | 0.027500 | 0.208745 | | 17000 | 0.025200 | 0.215685 | | 18000 | 0.017400 | 0.227938 | | 19000 | 0.016500 | 0.236160 | | 20000 | 0.013000 | 0.240447 | ### Framework versions - Transformers 4.40.2 - Pytorch 2.6.0+cu124 - Datasets 3.5.1 - Tokenizers 0.19.1 - Peft 0.10.0