Upload README.md with huggingface_hub
Browse files
README.md
CHANGED
@@ -1,86 +1,41 @@
|
|
1 |
---
|
2 |
-
library_name: transformers
|
3 |
-
license: mit
|
4 |
base_model: openai/whisper-large-v3-turbo
|
5 |
-
|
6 |
-
-
|
7 |
-
|
8 |
-
|
|
|
9 |
model-index:
|
10 |
-
- name: whisper-large-v3-turbo
|
11 |
-
results:
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
12 |
---
|
13 |
|
14 |
-
|
15 |
-
should probably proofread and complete it, then remove this comment. -->
|
16 |
-
|
17 |
-
# whisper-large-v3-turbo-bn
|
18 |
-
|
19 |
-
This model is a fine-tuned version of [openai/whisper-large-v3-turbo](https://huggingface.co/openai/whisper-large-v3-turbo) on the None dataset.
|
20 |
-
It achieves the following results on the evaluation set:
|
21 |
-
- Loss: 0.1089
|
22 |
-
- Model Preparation Time: 0.0054
|
23 |
-
- Wer Ortho: 26.4357
|
24 |
-
- Wer: 11.0532
|
25 |
-
- Cer Ortho: 7.5370
|
26 |
-
- Cer: 6.0587
|
27 |
-
|
28 |
-
## Model description
|
29 |
-
|
30 |
-
More information needed
|
31 |
-
|
32 |
-
## Intended uses & limitations
|
33 |
-
|
34 |
-
More information needed
|
35 |
-
|
36 |
-
## Training and evaluation data
|
37 |
-
|
38 |
-
More information needed
|
39 |
-
|
40 |
-
## Training procedure
|
41 |
-
|
42 |
-
### Training hyperparameters
|
43 |
-
|
44 |
-
The following hyperparameters were used during training:
|
45 |
-
- learning_rate: 1e-05
|
46 |
-
- train_batch_size: 64
|
47 |
-
- eval_batch_size: 64
|
48 |
-
- seed: 42
|
49 |
-
- optimizer: Use OptimizerNames.ADAMW_TORCH with betas=(0.9,0.999) and epsilon=1e-08 and optimizer_args=No additional optimizer arguments
|
50 |
-
- lr_scheduler_type: linear
|
51 |
-
- lr_scheduler_warmup_steps: 50
|
52 |
-
- training_steps: 2000
|
53 |
-
- mixed_precision_training: Native AMP
|
54 |
-
|
55 |
-
### Training results
|
56 |
|
57 |
-
|
58 |
-
|
59 |
-
| 0.3227 | 0.2985 | 100 | 0.1940 | 0.0054 | 51.5425 | 24.7080 | 15.4053 | 12.4001 |
|
60 |
-
| 0.1585 | 0.5970 | 200 | 0.1541 | 0.0054 | 43.2052 | 20.5941 | 13.2261 | 10.8048 |
|
61 |
-
| 0.1296 | 0.8955 | 300 | 0.1284 | 0.0054 | 37.8651 | 17.2373 | 11.1337 | 9.0344 |
|
62 |
-
| 0.0984 | 1.1940 | 400 | 0.1181 | 0.0054 | 35.7179 | 15.8488 | 10.4589 | 8.4411 |
|
63 |
-
| 0.0831 | 1.4925 | 500 | 0.1117 | 0.0054 | 33.2943 | 14.7721 | 9.6520 | 7.8391 |
|
64 |
-
| 0.0778 | 1.7910 | 600 | 0.1051 | 0.0054 | 31.8858 | 13.8226 | 9.1472 | 7.3725 |
|
65 |
-
| 0.0676 | 2.0896 | 700 | 0.1034 | 0.0054 | 30.0816 | 13.0923 | 8.6039 | 6.9990 |
|
66 |
-
| 0.0498 | 2.3881 | 800 | 0.0993 | 0.0054 | 29.2819 | 12.6100 | 8.3244 | 6.7352 |
|
67 |
-
| 0.0487 | 2.6866 | 900 | 0.0960 | 0.0054 | 28.8242 | 12.4199 | 8.3398 | 6.6952 |
|
68 |
-
| 0.0473 | 2.9851 | 1000 | 0.0946 | 0.0054 | 28.5619 | 12.1879 | 8.1810 | 6.6184 |
|
69 |
-
| 0.0322 | 3.2836 | 1100 | 0.0994 | 0.0054 | 27.7306 | 11.7283 | 7.8936 | 6.3322 |
|
70 |
-
| 0.0304 | 3.5821 | 1200 | 0.0974 | 0.0054 | 27.9168 | 11.8686 | 8.0736 | 6.4797 |
|
71 |
-
| 0.0304 | 3.8806 | 1300 | 0.0956 | 0.0054 | 27.2904 | 11.4362 | 7.7514 | 6.2139 |
|
72 |
-
| 0.0228 | 4.1791 | 1400 | 0.1023 | 0.0054 | 26.9930 | 11.2349 | 7.6544 | 6.1286 |
|
73 |
-
| 0.0179 | 4.4776 | 1500 | 0.0998 | 0.0054 | 26.7448 | 11.1543 | 7.6114 | 6.1014 |
|
74 |
-
| 0.0175 | 4.7761 | 1600 | 0.1014 | 0.0054 | 26.7975 | 11.1427 | 7.5925 | 6.0777 |
|
75 |
-
| 0.0163 | 5.0746 | 1700 | 0.1075 | 0.0054 | 26.7530 | 11.1690 | 7.6298 | 6.1284 |
|
76 |
-
| 0.01 | 5.3731 | 1800 | 0.1086 | 0.0054 | 26.5434 | 11.1396 | 7.5930 | 6.1084 |
|
77 |
-
| 0.0097 | 5.6716 | 1900 | 0.1084 | 0.0054 | 26.5446 | 11.0813 | 7.5709 | 6.0733 |
|
78 |
-
| 0.0096 | 5.9701 | 2000 | 0.1089 | 0.0054 | 26.4357 | 11.0532 | 7.5370 | 6.0587 |
|
79 |
|
|
|
80 |
|
81 |
-
###
|
|
|
|
|
|
|
|
|
|
|
82 |
|
83 |
-
|
84 |
-
-
|
85 |
-
-
|
86 |
-
-
|
|
|
|
|
|
1 |
---
|
|
|
|
|
2 |
base_model: openai/whisper-large-v3-turbo
|
3 |
+
datasets:
|
4 |
+
- bn
|
5 |
+
language: bn
|
6 |
+
library_name: transformers
|
7 |
+
license: apache-2.0
|
8 |
model-index:
|
9 |
+
- name: Finetuned openai/whisper-large-v3-turbo on Bengali
|
10 |
+
results:
|
11 |
+
- task:
|
12 |
+
type: automatic-speech-recognition
|
13 |
+
name: Speech-to-Text
|
14 |
+
dataset:
|
15 |
+
name: Common Voice (Bengali)
|
16 |
+
type: common_voice
|
17 |
+
metrics:
|
18 |
+
- type: wer
|
19 |
+
value: 11.053
|
20 |
---
|
21 |
|
22 |
+
# Finetuned openai/whisper-large-v3-turbo on 21409 Bengali training audio samples from cv-corpus-21.0-2025-03-14/bn.
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
23 |
|
24 |
+
This model was created from the Mozilla.ai Blueprint:
|
25 |
+
[speech-to-text-finetune](https://github.com/mozilla-ai/speech-to-text-finetune).
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
26 |
|
27 |
+
## Evaluation results on 9363 audio samples of Bengali:
|
28 |
|
29 |
+
### Baseline model (before finetuning) on Bengali
|
30 |
+
- Word Error Rate (Normalized): 78.843
|
31 |
+
- Word Error Rate (Orthographic): 107.027
|
32 |
+
- Character Error Rate (Normalized): 62.521
|
33 |
+
- Character Error Rate (Orthographic): 72.012
|
34 |
+
- Loss: 1.074
|
35 |
|
36 |
+
### Finetuned model (after finetuning) on Bengali
|
37 |
+
- Word Error Rate (Normalized): 11.053
|
38 |
+
- Word Error Rate (Orthographic): 26.436
|
39 |
+
- Character Error Rate (Normalized): 6.059
|
40 |
+
- Character Error Rate (Orthographic): 7.537
|
41 |
+
- Loss: 0.109
|