CoRal-project
/

roest-wav2vec2-315m-v2

Automatic Speech Recognition

Safetensors

Danish

wav2vec2

Eval Results

Model card Files Files and versions Community

MarieAlvenir commited on Apr 1

Commit

a5509a5

1 Parent(s): ef8b96e

Transcription section formatting

Browse files

Files changed (1) hide show

README.md +52 -27

README.md CHANGED Viewed

@@ -52,73 +52,89 @@ Next you can use the model using the `transformers` Python package as follows:
 >>> transcriber(audio)
 {'text': 'your transcription'}
 ```
-## Transcription examples
-### Example 1
 <audio controls>
   <source src="https://huggingface.co/CoRal-dataset/roest-wav2vec2-315m-v2/resolve/main/audio_samples/example1.wav" type="audio/wav">
   Your browser does not support the audio tag.
 </audio>
-**Dialect:** Vestjysk
-**Transcription:** det blev til yderlig ti mål i den første sæson på trods af en position som back
-**Target transcription:** det blev til yderligere ti mål i den første sæson på trods af en position som back
-**CER:** 3.7%
-**WER:** 5.9%
-### Example 2
 <audio controls>
   <source src="https://huggingface.co/CoRal-dataset/roest-wav2vec2-315m-v2/resolve/main/audio_samples/example2.wav" type="audio/wav">
   Your browser does not support the audio tag.
 </audio>
-**Dialect:** Sønderjysk
-**Transcription:** en arkitektoniske udformning af pladser forslagene iver benzen
-**Target transcription:** den arkitektoniske udformning af pladsen er forestået af ivar bentsen
-**CER:** 20.3%
-**WER:** 60.0%
-### Example 3
 <audio controls>
   <source src="https://huggingface.co/CoRal-dataset/roest-wav2vec2-315m-v2/resolve/main/audio_samples/example3.wav" type="audio/wav">
   Your browser does not support the audio tag.
 </audio>
-**Dialect:** Nordsjællandsk
-**Transcription:** østrig og ungarn samarbejder om søen gennem den østrigske og ungarske vandkommission
-**Target transcription:** østrig og ungarn samarbejder om søen gennem den østrigske og ungarske vandkommission
-**CER:** 0.0%
-**WER:** 0.0%
-### Example 4
 <audio controls>
   <source src="https://huggingface.co/CoRal-dataset/roest-wav2vec2-315m-v2/resolve/main/audio_samples/example4.wav" type="audio/wav">
   Your browser does not support the audio tag.
 </audio>
-**Dialect:** Lollandsk
-**Transcription:** det er produceret af thomas helme og indspillede i easy sound recording studio i københavn
-**Target transcription:** det er produceret af thomas helmig og indspillet i easy sound recording studio i københavn
-**CER:** 4.4%
-**WER:** 13.3%
 ## Model Details
@@ -127,6 +143,9 @@ Wav2Vec2 is a state-of-the-art model architecture for speech recognition, levera
 python src/scripts/finetune_asr_model.py model=wav2vec2-small max_steps=30000 datasets.coral_conversation_internal.id=CoRal-dataset/coral-v2 datasets.coral_readaloud_internal.id=CoRal-dataset/coral-v2
 ```
 The model is evaluated using a Language Model (LM) as post-processing. The utilized LM is the one trained and used by [alexandrainst/roest-wav2vec2-315m-v1](https://huggingface.co/alexandrainst/roest-315m).
 ## Dataset
 ### [CoRal-v2](https://huggingface.co/datasets/CoRal-dataset/coral-v2/tree/main)
@@ -138,6 +157,8 @@ The model is evaluated using a Language Model (LM) as post-processing. The utili
 ### License
 Note that the dataset used is licensed under a custom license, adapted from OpenRAIL-M, which allows commercial use with a few restrictions (speech synthesis and biometric identification). See [license](https://huggingface.co/Alvenir/coral-1-whisper-large/blob/main/LICENSE).
 ## Evaluation
 The model was evaluated using the following metrics:
@@ -246,9 +267,13 @@ The model was also tested against other datasets to evaluate generalizability:
 | [Fleurs-da_dk](https://huggingface.co/datasets/google/fleurs)                         | 27.3        | 7.9   | **26.4**    | **7.7**  |
 | [Fleurs-da_dk](https://huggingface.co/datasets/google/fleurs) Normed                  | 16.6        | 6.3   | **15.6**    | **6.1**  |
 ## Training curves
 <img src="https://huggingface.co/CoRal-dataset/roest-wav2vec2-315m-v2/resolve/main/images/training_plots.png">
 ## Creators and Funders
 This model has been trained and the model card written by Marie Juhl Jørgensen and Søren Vejlgaard Holm at [Alvenir](https://www.alvenir.ai/).

 >>> transcriber(audio)
 {'text': 'your transcription'}
 ```
+Certainly! Here’s a refined version of the transcription examples section, organized for better readability and presentation:
+---
+## Transcription Examples
+Explore the following audio samples along with their transcriptions and accuracy metrics. Each example showcases the model's performance with different Danish dialects.
+### Example 1 - Vestjysk Dialect
+**Audio Sample:**
 <audio controls>
   <source src="https://huggingface.co/CoRal-dataset/roest-wav2vec2-315m-v2/resolve/main/audio_samples/example1.wav" type="audio/wav">
   Your browser does not support the audio tag.
 </audio>
+**Model Transcription:**
+*det blev til yderlig ti mål i den første sæson på trods af en position som back*
+**Target Transcription:**
+*det blev til yderligere ti mål i den første sæson på trods af en position som back*
+- **Character Error Rate (CER):** 3.7%
+- **Word Error Rate (WER):** 5.9%
+---
+### Example 2 - Sønderjysk Dialect
+**Audio Sample:**
 <audio controls>
   <source src="https://huggingface.co/CoRal-dataset/roest-wav2vec2-315m-v2/resolve/main/audio_samples/example2.wav" type="audio/wav">
   Your browser does not support the audio tag.
 </audio>
+**Model Transcription:**
+*en arkitektoniske udformning af pladser forslagene iver benzen*
+**Target Transcription:**
+*den arkitektoniske udformning af pladsen er forestået af ivar bentsen*
+- **Character Error Rate (CER):** 20.3%
+- **Word Error Rate (WER):** 60.0%
+---
+### Example 3 - Nordsjællandsk Dialect
+**Audio Sample:**
 <audio controls>
   <source src="https://huggingface.co/CoRal-dataset/roest-wav2vec2-315m-v2/resolve/main/audio_samples/example3.wav" type="audio/wav">
   Your browser does not support the audio tag.
 </audio>
+**Model Transcription:**
+*østrig og ungarn samarbejder om søen gennem den østrigske og ungarske vandkommission*
+**Target Transcription:**
+*østrig og ungarn samarbejder om søen gennem den østrigske og ungarske vandkommission*
+- **Character Error Rate (CER):** 0.0%
+- **Word Error Rate (WER):** 0.0%
+---
+### Example 4 - Lollandsk Dialect
+**Audio Sample:**
 <audio controls>
   <source src="https://huggingface.co/CoRal-dataset/roest-wav2vec2-315m-v2/resolve/main/audio_samples/example4.wav" type="audio/wav">
   Your browser does not support the audio tag.
 </audio>
+**Model Transcription:**
+*det er produceret af thomas helme og indspillede i easy sound recording studio i københavn*
+**Target Transcription:**
+*det er produceret af thomas helmig og indspillet i easy sound recording studio i københavn*
+- **Character Error Rate (CER):** 4.4%
+- **Word Error Rate (WER):** 13.3%
+---
 ## Model Details
 python src/scripts/finetune_asr_model.py model=wav2vec2-small max_steps=30000 datasets.coral_conversation_internal.id=CoRal-dataset/coral-v2 datasets.coral_readaloud_internal.id=CoRal-dataset/coral-v2
 ```
 The model is evaluated using a Language Model (LM) as post-processing. The utilized LM is the one trained and used by [alexandrainst/roest-wav2vec2-315m-v1](https://huggingface.co/alexandrainst/roest-315m).
+---
 ## Dataset
 ### [CoRal-v2](https://huggingface.co/datasets/CoRal-dataset/coral-v2/tree/main)
 ### License
 Note that the dataset used is licensed under a custom license, adapted from OpenRAIL-M, which allows commercial use with a few restrictions (speech synthesis and biometric identification). See [license](https://huggingface.co/Alvenir/coral-1-whisper-large/blob/main/LICENSE).
+---
 ## Evaluation
 The model was evaluated using the following metrics:
 | [Fleurs-da_dk](https://huggingface.co/datasets/google/fleurs)                         | 27.3        | 7.9   | **26.4**    | **7.7**  |
 | [Fleurs-da_dk](https://huggingface.co/datasets/google/fleurs) Normed                  | 16.6        | 6.3   | **15.6**    | **6.1**  |
+---
 ## Training curves
 <img src="https://huggingface.co/CoRal-dataset/roest-wav2vec2-315m-v2/resolve/main/images/training_plots.png">
+---
 ## Creators and Funders
 This model has been trained and the model card written by Marie Juhl Jørgensen and Søren Vejlgaard Holm at [Alvenir](https://www.alvenir.ai/).