mozilla-ai
/

whisper-large-v3-turbo-bn

@@ -1,86 +1,41 @@
 ---
-library_name: transformers
-license: mit
 base_model: openai/whisper-large-v3-turbo
-tags:
-- generated_from_trainer
-metrics:
-- wer
 model-index:
-- name: whisper-large-v3-turbo-bn
-  results: []
 ---
-<!-- This model card has been generated automatically according to the information the Trainer had access to. You
-should probably proofread and complete it, then remove this comment. -->
-# whisper-large-v3-turbo-bn
-This model is a fine-tuned version of [openai/whisper-large-v3-turbo](https://huggingface.co/openai/whisper-large-v3-turbo) on the None dataset.
-It achieves the following results on the evaluation set:
-- Loss: 0.1089
-- Model Preparation Time: 0.0054
-- Wer Ortho: 26.4357
-- Wer: 11.0532
-- Cer Ortho: 7.5370
-- Cer: 6.0587
-## Model description
-More information needed
-## Intended uses & limitations
-More information needed
-## Training and evaluation data
-More information needed
-## Training procedure
-### Training hyperparameters
-The following hyperparameters were used during training:
-- learning_rate: 1e-05
-- train_batch_size: 64
-- eval_batch_size: 64
-- seed: 42
-- optimizer: Use OptimizerNames.ADAMW_TORCH with betas=(0.9,0.999) and epsilon=1e-08 and optimizer_args=No additional optimizer arguments
-- lr_scheduler_type: linear
-- lr_scheduler_warmup_steps: 50
-- training_steps: 2000
-- mixed_precision_training: Native AMP
-### Training results
-| Training Loss | Epoch  | Step | Validation Loss | Model Preparation Time | Wer Ortho | Wer     | Cer Ortho | Cer     |
-|:-------------:|:------:|:----:|:---------------:|:----------------------:|:---------:|:-------:|:---------:|:-------:|
-| 0.3227        | 0.2985 | 100  | 0.1940          | 0.0054                 | 51.5425   | 24.7080 | 15.4053   | 12.4001 |
-| 0.1585        | 0.5970 | 200  | 0.1541          | 0.0054                 | 43.2052   | 20.5941 | 13.2261   | 10.8048 |
-| 0.1296        | 0.8955 | 300  | 0.1284          | 0.0054                 | 37.8651   | 17.2373 | 11.1337   | 9.0344  |
-| 0.0984        | 1.1940 | 400  | 0.1181          | 0.0054                 | 35.7179   | 15.8488 | 10.4589   | 8.4411  |
-| 0.0831        | 1.4925 | 500  | 0.1117          | 0.0054                 | 33.2943   | 14.7721 | 9.6520    | 7.8391  |
-| 0.0778        | 1.7910 | 600  | 0.1051          | 0.0054                 | 31.8858   | 13.8226 | 9.1472    | 7.3725  |
-| 0.0676        | 2.0896 | 700  | 0.1034          | 0.0054                 | 30.0816   | 13.0923 | 8.6039    | 6.9990  |
-| 0.0498        | 2.3881 | 800  | 0.0993          | 0.0054                 | 29.2819   | 12.6100 | 8.3244    | 6.7352  |
-| 0.0487        | 2.6866 | 900  | 0.0960          | 0.0054                 | 28.8242   | 12.4199 | 8.3398    | 6.6952  |
-| 0.0473        | 2.9851 | 1000 | 0.0946          | 0.0054                 | 28.5619   | 12.1879 | 8.1810    | 6.6184  |
-| 0.0322        | 3.2836 | 1100 | 0.0994          | 0.0054                 | 27.7306   | 11.7283 | 7.8936    | 6.3322  |
-| 0.0304        | 3.5821 | 1200 | 0.0974          | 0.0054                 | 27.9168   | 11.8686 | 8.0736    | 6.4797  |
-| 0.0304        | 3.8806 | 1300 | 0.0956          | 0.0054                 | 27.2904   | 11.4362 | 7.7514    | 6.2139  |
-| 0.0228        | 4.1791 | 1400 | 0.1023          | 0.0054                 | 26.9930   | 11.2349 | 7.6544    | 6.1286  |
-| 0.0179        | 4.4776 | 1500 | 0.0998          | 0.0054                 | 26.7448   | 11.1543 | 7.6114    | 6.1014  |
-| 0.0175        | 4.7761 | 1600 | 0.1014          | 0.0054                 | 26.7975   | 11.1427 | 7.5925    | 6.0777  |
-| 0.0163        | 5.0746 | 1700 | 0.1075          | 0.0054                 | 26.7530   | 11.1690 | 7.6298    | 6.1284  |
-| 0.01          | 5.3731 | 1800 | 0.1086          | 0.0054                 | 26.5434   | 11.1396 | 7.5930    | 6.1084  |
-| 0.0097        | 5.6716 | 1900 | 0.1084          | 0.0054                 | 26.5446   | 11.0813 | 7.5709    | 6.0733  |
-| 0.0096        | 5.9701 | 2000 | 0.1089          | 0.0054                 | 26.4357   | 11.0532 | 7.5370    | 6.0587  |
-### Framework versions
-- Transformers 4.49.0
-- Pytorch 2.6.0+cu124
-- Datasets 3.3.2
-- Tokenizers 0.21.0

 ---
 base_model: openai/whisper-large-v3-turbo
+datasets:
+- bn
+language: bn
+library_name: transformers
+license: apache-2.0
 model-index:
+- name: Finetuned openai/whisper-large-v3-turbo on Bengali
+  results:
+  - task:
+      type: automatic-speech-recognition
+      name: Speech-to-Text
+    dataset:
+      name: Common Voice (Bengali)
+      type: common_voice
+    metrics:
+    - type: wer
+      value: 11.053
 ---
+# Finetuned openai/whisper-large-v3-turbo on 21409 Bengali training audio samples from cv-corpus-21.0-2025-03-14/bn.
+This model was created from the Mozilla.ai Blueprint:
+[speech-to-text-finetune](https://github.com/mozilla-ai/speech-to-text-finetune).
+## Evaluation results on 9363 audio samples of Bengali:
+### Baseline model (before finetuning) on Bengali
+- Word Error Rate (Normalized): 78.843
+- Word Error Rate (Orthographic): 107.027
+- Character Error Rate (Normalized): 62.521
+- Character Error Rate (Orthographic): 72.012
+- Loss: 1.074
+### Finetuned model (after finetuning) on Bengali
+- Word Error Rate (Normalized): 11.053
+- Word Error Rate (Orthographic): 26.436
+- Character Error Rate (Normalized): 6.059
+- Character Error Rate (Orthographic): 7.537
+- Loss: 0.109