MAdel121 commited on
Commit
52a06d8
·
verified ·
1 Parent(s): 446af13

Upload fine-tuned Whisper Medium Egyptian model: whisper-medium-egy

Browse files
README.md ADDED
@@ -0,0 +1,150 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ ---
2
+ language: ar
3
+ license: apache-2.0
4
+ tags:
5
+ - whisper
6
+ - automatic-speech-recognition
7
+ - asr
8
+ - audio
9
+ - arabic
10
+ - egyptian-arabic
11
+ datasets:
12
+ - MAdel121/arabic-egy-cleaned
13
+ metrics:
14
+ - wer
15
+ - cer
16
+ base_model: openai/whisper-medium
17
+ pipeline_tag: automatic-speech-recognition
18
+ library_name: transformers
19
+ model-index:
20
+ - name: whisper-medium-egy
21
+ results:
22
+ - task:
23
+ type: automatic-speech-recognition
24
+ name: Speech Recognition
25
+ dataset:
26
+ name: MAdel121/arabic-egy-cleaned (validation split)
27
+ type: MAdel121/arabic-egy-cleaned
28
+ config: ar
29
+ split: validation
30
+ metrics:
31
+ - name: WER
32
+ type: wer
33
+ value: 18.029990439289488
34
+ - name: CER
35
+ type: cer
36
+ value: 13.375029793807732
37
+ ---
38
+
39
+ # Whisper Medium Egyptian Arabic (whisper-medium-egy)
40
+
41
+ This model is a fine-tuned version of [openai/whisper-medium](https://huggingface.co/openai/whisper-medium) on a custom dataset of 72 hours of Egyptian Arabic speech. It's designed for Automatic Speech Recognition (ASR) for the Egyptian Arabic dialect.
42
+
43
+ ## Model Description
44
+
45
+ * **Base Model:** `openai/whisper-medium`
46
+ * **Language:** Arabic (ar), specifically focused on Egyptian dialect (arz)
47
+ * **Fine-tuning Dataset:** `MAdel121/arabic-egy-cleaned` (approx. 72 hours)
48
+ * **Total Training Steps:** 7299
49
+ * **Epochs:** 10
50
+
51
+ ## Intended Uses & Limitations
52
+
53
+ This model is intended for transcribing speech in Egyptian Arabic.
54
+
55
+ **Intended Use:**
56
+ * Automatic transcription of audio recordings and live speech in Egyptian Arabic.
57
+ * Assisting with content creation, subtitling, and voice-controlled applications for Egyptian Arabic speakers.
58
+
59
+ **Limitations:**
60
+ * Performance may degrade in highly noisy environments or with very strong, non-Egyptian accents.
61
+ * The model was fine-tuned on a specific dataset; its performance on significantly different domains or audio characteristics might vary.
62
+ * The training data primarily consists of [describe your dataset sources/domains if possible, e.g., "YouTube videos", "audiobooks", "scripted conversations"]. Performance might be better on similar types of audio.
63
+
64
+ ## How to Use
65
+
66
+ You can use this model with the `transformers` library and the `pipeline` interface for ease of use.
67
+
68
+ ```python
69
+ from transformers import pipeline
70
+ import torch
71
+
72
+ device = "cuda:0" if torch.cuda.is_available() else "cpu"
73
+
74
+ pipe = pipeline(
75
+ "automatic-speech-recognition",
76
+ model="YOUR_HF_USERNAME/whisper-medium-egy", # Replace YOUR_HF_USERNAME with your Hugging Face username
77
+ device=device
78
+ )
79
+
80
+ # Example with a local audio file
81
+ # audio_file = "path/to/your/egyptian_arabic_audio.wav"
82
+ # transcription = pipe(audio_file, generate_kwargs={"language": "arabic"})["text"]
83
+ # print(transcription)
84
+
85
+ # Example with a Hugging Face dataset audio sample
86
+ # from datasets import load_dataset
87
+ # ds = load_dataset("MAdel121/arabic-egy-cleaned", "ar", split="validation") # Or your test split
88
+ # sample = ds[0]["audio"] # Make sure your dataset has an "audio" column
89
+ # result = pipe(sample.copy(), generate_kwargs={"language": "arabic"})
90
+ # print(result["text"])
91
+ ```
92
+ Make sure to replace `"YOUR_HF_USERNAME/whisper-medium-egy"` with the actual model ID after uploading. The `generate_kwargs={"language": "arabic"}` is important for Whisper models to ensure correct tokenization and transcription for the target language.
93
+
94
+ ## Training Data
95
+
96
+ The model was fine-tuned on the `MAdel121/arabic-egy-cleaned` dataset available on the Hugging Face Hub. This dataset contains approximately 72 hours of Egyptian Arabic audio paired with transcripts.
97
+
98
+ ## Training Procedure
99
+
100
+ The model was trained using the `transformers` library. The fine-tuning process involved the following key hyperparameters:
101
+
102
+ * **Base Model:** `openai/whisper-medium`
103
+ * **Optimizer:** AdamW
104
+ * **Learning Rate:** 1e-5 (0.00001)
105
+ * **Warmup Steps:** 1000
106
+ * **Weight Decay:** 0.05
107
+ * **Gradient Accumulation Factor:** 2
108
+ * **Batch Size (loader_batch_size):** 8 (effective batch size would be 8 * 2 = 16)
109
+ * **Number of Epochs:** 10
110
+ * **Max Grad Norm:** 5
111
+ * **Augmentations Used:**
112
+ * `use_drop_freq`: true
113
+ * `use_drop_chunk`: true
114
+ * `use_drop_bit_resolution`: true
115
+ * Other augmentations like `use_add_noise`, `use_speed_perturb`, `use_pitch_shift`, `use_add_reverb`, `use_codec_augment`, `use_gain` were set to `false`
116
+ * **Task:** transcribe
117
+ * **Language:** ar
118
+ * **Seed:** 1986
119
+
120
+ The training was managed and tracked using Weights & Biases under the project `whisper-medium-egyptian-arabic` with resume ID `r3sz4v27`.
121
+
122
+ ## Training Code
123
+
124
+ Can be found on [Github here](https://github.com/moadel321/Fine-tuning-whisper-on-Modal-Labs-with-speech-brain-augmentations-/blob/c85312785faa2b927cbc217fe43acb8ed660d2ee/train_whisper_modal.py)
125
+
126
+ ## Weights & Biases
127
+
128
+ Run can be found here : https://wandb.ai/m-adelomar1/whisper-medium-egyptian-arabic/
129
+
130
+ ## Evaluation Results
131
+
132
+ The model was evaluated on the `validation` split of the `MAdel121/arabic-egy-cleaned` dataset.
133
+
134
+ * **Word Error Rate (WER):** 18.03%
135
+ * **Character Error Rate (CER):** 13.38%
136
+
137
+ These metrics indicate the performance of the model on the validation set. Lower values are better.
138
+
139
+ ### BibTeX Citation
140
+
141
+ ```bibtex
142
+ @misc{your_name_2024_whisper_medium_egy,
143
+ author = Madel
144
+ title = {Whisper Medium Fine-tuned for Egyptian Arabic},
145
+ year = {2025},
146
+ publisher = {Hugging Face},
147
+ journal = {Hugging Face Hub},
148
+ howpublished = {\\url{https://huggingface.co/MAdel121/whisper-medium-egy}} // Replace with actual URL
149
+ }
150
+ ```
model/CKPT.yaml ADDED
@@ -0,0 +1,4 @@
 
 
 
 
 
1
+ # yamllint disable
2
+ brain_intra_epoch_ckpt: true
3
+ end-of-epoch: false
4
+ unixtime: 1746494038.3237214
model/brain.ckpt ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:64af57c5b2b2982bda94205f9340a6e14b9fa13e472b89793fbd36575371282b
3
+ size 65
model/counter.ckpt ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:4a44dc15364204a80fe80e9039455cc1608281820fe2b24f1e5233ade6af1dd5
3
+ size 2
model/dataloader-TRAIN.ckpt ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:b24bdc2fb415e6a7038f442fd99a7144f3cfe358086a1ba9cfb1ac0a44ed7bb2
3
+ size 4
model/model.ckpt ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:d792ab272f5fb4d0d48b7b6836d79b1ebed948b7872aa0c9f827c25f6d956e25
3
+ size 3055793114
model/optimizer.ckpt ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:852d2cad94668a6e9b2f1ca78a9d792f5430ec87fed36adbc9ae04a1783b043f
3
+ size 6111664039
model/scheduler.ckpt ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:000d9d4bec2874c99cd692c4431560aab31f77ae0d6b007244172cda4ac86c42
3
+ size 936