Ashegh-Sad-Warrior
/

persian-whisper-large-v3-10-percent-17-0-one-epoch

@@ -1,79 +1,103 @@
----
-language:
-- fa
-license: apache-2.0
-base_model: openai/whisper-large-v3
-tags:
-- generated_from_trainer
-datasets:
-- mozilla-foundation-common-voice-17-0
-metrics:
-- wer
-model-index:
-- name: Whisper LargeV3 Persian - Persian ASR
-  results:
-  - task:
-      name: Automatic Speech Recognition
-      type: automatic-speech-recognition
-    dataset:
-      name: common-voice-17-0
-      type: mozilla-foundation-common-voice-17-0
-      config: default
-      split: test[:10%]
-      args: 'config: Persian, split: train[:10%]+validation[:10%]'
-    metrics:
-    - name: Wer
-      type: wer
-      value: 38.94514767932489
----
-<!-- This model card has been generated automatically according to the information the Trainer had access to. You
-should probably proofread and complete it, then remove this comment. -->
-# Whisper LargeV3 Persian - Persian ASR
-This model is a fine-tuned version of [openai/whisper-large-v3](https://huggingface.co/openai/whisper-large-v3) on the common-voice-17-0 dataset.
-It achieves the following results on the evaluation set:
-- Loss: 0.4072
-- Wer: 38.9451
-## Model description
-More information needed
-## Intended uses & limitations
-More information needed
-## Training and evaluation data
-More information needed
-## Training procedure
-### Training hyperparameters
-The following hyperparameters were used during training:
-- learning_rate: 1e-05
-- train_batch_size: 4
-- eval_batch_size: 4
-- seed: 42
-- optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
-- lr_scheduler_type: linear
-- lr_scheduler_warmup_steps: 500
-- num_epochs: 1
-- mixed_precision_training: Native AMP
-### Training results
-| Training Loss | Epoch | Step | Validation Loss | Wer     |
-|:-------------:|:-----:|:----:|:---------------:|:-------:|
-| 0.2083        | 1.0   | 987  | 0.4072          | 38.9451 |
-### Framework versions
-- Transformers 4.44.0
-- Pytorch 2.4.0+cu121
-- Datasets 2.21.0
-- Tokenizers 0.19.1

+---
+language:
+- fa
+license: apache-2.0
+base_model: openai/whisper-large-v3
+tags:
+- generated_from_trainer
+datasets:
+- mozilla-foundation-common-voice-17-0
+metrics:
+- wer
+model-index:
+- name: Whisper LargeV3 Persian - Persian ASR
+  results:
+  - task:
+      name: Automatic Speech Recognition
+      type: automatic-speech-recognition
+    dataset:
+      name: common-voice-17-0
+      type: mozilla-foundation-common-voice-17-0
+      config: default
+      split: test[:10%]
+      args: 'config: Persian, split: train[:10%]+validation[:10%]'
+    metrics:
+    - name: Wer
+      type: wer
+      value: 38.94514767932489
+---
+<!-- This model card has been generated automatically according to the information the Trainer had access to. You
+should probably proofread and complete it, then remove this comment. -->
+# Whisper LargeV3 Persian - Persian ASR
+This model is a fine-tuned version of [openai/whisper-large-v3](https://huggingface.co/openai/whisper-large-v3)on the Common Voice 17.0 dataset in Persian.
+The model has been trained for Automatic Speech Recognition (ASR) and is capable of converting spoken Persian into text.
+The following sections provide more details on its performance, intended uses, training data, and the procedure followed during training.
+It achieves the following results on the evaluation set:
+- Loss: 0.4072
+- Wer: 38.9451
+## Model description
+This model leverages the Whisper architecture, known for its effectiveness in multilingual ASR tasks.
+Whisper models are trained on a large corpus of multilingual and multitask supervised data,
+enabling them to generalize well across different languages, including low-resource languages like Persian.
+This fine-tuned model is specifically adapted for Persian, improving its accuracy on Persian speech recognition tasks.
+## Intended uses & limitations
+This model is designed for speech-to-text tasks in the Persian language. It can be used for applications like transcription of audio files, voice-controlled systems,
+and any task requiring accurate conversion of spoken Persian into text. However, the model may have limitations when dealing with noisy audio environments,
+diverse accents, or highly technical vocabulary not present in the training data.
+It's recommended to fine-tune the model further if your use case involves specialized language or contexts.
+## Training and evaluation data
+The model was fine-tuned using the Common Voice 17.0 dataset, which is a crowd-sourced dataset containing diverse voices in Persian.
+The dataset was split into training, validation, and test sets. The training set includes a variety of speakers, ages, and accents,
+making the model robust across different dialects of Persian. The test split used for evaluation represents approximately 10% of the total data, ensuring a reliable assessment of the model's performance.
+## Training procedure
+### Training hyperparameters
+The following hyperparameters were used during training:
+- learning_rate: 1e-05
+- train_batch_size: 4
+- eval_batch_size: 4
+- seed: 42
+- optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08,which helps in maintaining stability during training.
+- lr_scheduler_type: linear
+- lr_scheduler_warmup_steps: 500
+- num_epochs: 1 ,meaning the model was trained over the entire dataset once.
+- mixed_precision_training: Native AMP, which allows for faster training by using lower precision without significant loss in accuracy.
+### Training results
+During training, the model achieved the following results:
+  -  Training Loss: 0.2083 at the end of 1 epoch.
+  -  Validation Loss: 0.4072, showing how well the model generalizes to unseen data.
+  -  Word Error Rate (WER): 38.9451, indicating the percentage of words incorrectly predicted during the ASR task on the validation set.
+| Training Loss | Epoch | Step | Validation Loss | Wer     |
+|:-------------:|:-----:|:----:|:---------------:|:-------:|
+| 0.2083        | 1.0   | 987  | 0.4072          | 38.9451 |
+These results highlight the model's ability to perform well on the given dataset, though there may be room for further optimization and fine-tuning.
+### Framework versions
+The model was trained using the following versions of libraries:
+  - Transformers: 4.44.0, which provides the necessary tools and APIs to fine-tune transformer models like Whisper.
+  - Pytorch: 2.4.0+cu121, the deep learning framework used to build and train the model.
+  - Datasets: 2.21.0, which facilitated the loading and preprocessing of the Common Voice dataset.
+  - Tokenizers: 0.19, used for efficiently handling text tokenization required by the model.
+- Transformers 4.44.0
+- Pytorch 2.4.0+cu121
+- Datasets 2.21.0
+- Tokenizers 0.19.1