metadata

library_name: peft
datasets:
  - mozilla-foundation/common_voice_17_0
language:
  - bn
base_model:
  - openai/whisper-base
license: apache-2.0
metrics:
  - wer
pipeline_tag: automatic-speech-recognition
model-index:
  - name: Whisper Base Bn LoRA Adapter - BanglaBridge
    results:
      - task:
          name: Automatic Speech Recognition
          type: automatic-speech-recognition
        dataset:
          name: Common Voice 17.0
          type: mozilla-foundation/common_voice_17_0
          config: bn
          split: None
          args: 'config: bn, split: test'
        metrics:
          - name: Wer
            type: wer
            value: 22.56397

Whisper Base Bn LoRA Adapter - by BanglaBridge

This model is a PEFT LoRA fine-tuned version of openai/whisper-base on the Common Voice 17.0 dataset. It achieves the following results on the test set:

Wer: 44.93734
Normalized Wer: 22.56397

Training hyperparameters

The following hyperparameters were used during training:

learning_rate: 1e-03
train_batch_size: 32
eval_batch_size: 32
warmup_steps: 500
training_steps: 20000

LoraConfig:

r: 32
lora_alpha: 64
target_modules: ["q_proj", "v_proj"]
lora_dropout: 0.005
bias: none

Training results

Step	Training Loss	Validation Loss
1000	0.240200	0.251211
2000	0.178700	0.210411
3000	0.150000	0.193197
4000	0.122500	0.184060
5000	0.122300	0.177079
6000	0.097100	0.181073
7000	0.095800	0.175566
8000	0.071400	0.173997
9000	0.082600	0.175677
10000	0.064400	0.178262
11000	0.064700	0.177943
12000	0.046900	0.185763
13000	0.047200	0.186843
14000	0.037500	0.193575
15000	0.036000	0.199084
16000	0.027500	0.208745
17000	0.025200	0.215685
18000	0.017400	0.227938
19000	0.016500	0.236160
20000	0.013000	0.240447

Framework versions

Transformers 4.40.2
Pytorch 2.6.0+cu124
Datasets 3.5.1
Tokenizers 0.19.1
Peft 0.10.0