techiaith
/

whisper-large-v3-ft-commonvoice-cy

@@ -1,81 +1,53 @@
----
-license: apache-2.0
-base_model: openai/whisper-large-v3
-tags:
-- generated_from_trainer
-datasets:
-- DewiBrynJones/commonvoice_18_0_cy
-metrics:
-- wer
-model-index:
-- name: whisper-large-v3-ft-cv-cy-train-all-plus-other-with-excluded
-  results:
-  - task:
-      name: Automatic Speech Recognition
-      type: automatic-speech-recognition
-    dataset:
-      name: DewiBrynJones/commonvoice_18_0_cy default
-      type: DewiBrynJones/commonvoice_18_0_cy
-      args: default
-    metrics:
-    - name: Wer
-      type: wer
-      value: 0.1676010974591435
----
-<!-- This model card has been generated automatically according to the information the Trainer had access to. You
-should probably proofread and complete it, then remove this comment. -->
-# whisper-large-v3-ft-cv-cy-train-all-plus-other-with-excluded
-This model is a fine-tuned version of [openai/whisper-large-v3](https://huggingface.co/openai/whisper-large-v3) on the DewiBrynJones/commonvoice_18_0_cy default dataset.
-It achieves the following results on the evaluation set:
-- Loss: 0.3280
-- Wer: 0.1676
-## Model description
-More information needed
-## Intended uses & limitations
-More information needed
-## Training and evaluation data
-More information needed
-## Training procedure
-### Training hyperparameters
-The following hyperparameters were used during training:
-- learning_rate: 1e-05
-- train_batch_size: 16
-- eval_batch_size: 16
-- seed: 42
-- gradient_accumulation_steps: 2
-- total_train_batch_size: 32
-- optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
-- lr_scheduler_type: linear
-- lr_scheduler_warmup_steps: 500
-- training_steps: 5000
-- mixed_precision_training: Native AMP
-### Training results
-| Training Loss | Epoch  | Step | Validation Loss | Wer    |
-|:-------------:|:------:|:----:|:---------------:|:------:|
-| 0.1583        | 1.4144 | 1000 | 0.2562          | 0.2062 |
-| 0.0675        | 2.8289 | 2000 | 0.2394          | 0.1849 |
-| 0.0113        | 4.2433 | 3000 | 0.2729          | 0.1722 |
-| 0.0036        | 5.6577 | 4000 | 0.3004          | 0.1705 |
-| 0.0012        | 7.0721 | 5000 | 0.3280          | 0.1676 |
-### Framework versions
-- Transformers 4.44.0
-- Pytorch 2.4.0+cu121
-- Datasets 2.20.0
-- Tokenizers 0.19.1

+---
+license: apache-2.0
+base_model: openai/whisper-large-v3
+tags:
+- generated_from_trainer
+- whisper
+datasets:
+- techiaith/commonvoice_18_0_cy
+metrics:
+- wer
+model-index:
+- name: whisper-large-v3-ft-cv-cy-train-all-plus-other-with-excluded
+  results:
+  - task:
+      name: Automatic Speech Recognition
+      type: automatic-speech-recognition
+    dataset:
+      name: DewiBrynJones/commonvoice_18_0_cy default
+      type: DewiBrynJones/commonvoice_18_0_cy
+      args: default
+    metrics:
+    - name: Wer
+      type: wer
+      value: 0.185
+language:
+- cy
+pipeline_tag: automatic-speech-recognition
+---
+# whisper-large-v3-ft-cv-cy
+This model is a version of [openai/whisper-large-v3](https://huggingface.co/openai/whisper-large-v3) fine-tuned with the
+`train_all` and `other_with_excluded` custom splits from [techiaith/commonvoice_18_0_cy](https://huggingface.co/datasets/techiaith/commonvoice_18_0_cy)
+It achieves the following results on the Common Voice for Welsh release 18's standard test set:
+ - WER: 18.50
+ - CER: 5.32
+ N.B. this model performs considerably worse on English language speech, but better on Welsh than a [bilingual model](https://huggingface.co/techiaith/whisper-large-v3-ft-cv-cy-en)
+## Usage
+```python
+from transformers import pipeline
+transcriber = pipeline("automatic-speech-recognition", model="techiaith/whisper-large-v3-ft-cv-cy")
+result = transcriber(<path or url to soundfile>)
+print (result)
+```
+`{'text': 'Mae hen wlad fy nhadau yn annwyl i mi.'}`